A picture to explain the methodology of causal inference, the master key for analysis when AB testing is not possible

1. Background

In the process of rapid product iteration, data analysts need to quantitatively evaluate the effects of different marketing activities and product iterations that actually affect business indicators, explore the causal relationship between products and businesses, and learn from the results to continuously verify the direction of product iteration , Make its business direction more clear.

However, the product itself has a natural growth trend, and has obvious seasonal and cyclical fluctuations. How to eliminate the impact of such natural factors and other interference factors on the product? The general method of the Internet industry is the AB test; but for certain indicators When the traffic cannot be satisfied or the AB test cannot be implemented in some scenarios, the statistical "causal inference" method is becoming a new direction for Internet business evaluation applications, usually used in behavioral science research to understand the business causality in the results of observation data .

2. Effect quantitative evaluation method

There are two general research design directions for quantitative evaluation of effects:

One is the AB test. In experimental research, users are randomly assigned to different experimental groups and control groups, and then the sample size required to achieve the experimental effect is calculated, and the experimental result is calculated when the experiment meets the sample size. Because randomness controls the influence of other disturbing variables, the experimental result is the actual influence of the experimental factor on the outcome variable.

The second is observation and research. Statistical correlation does not mean causality, and even causality cannot directly infer the establishment of correlation, so it is not easy to find the factors that really affect the business. The general A/B Test also has certain limitations. It needs to occupy a sufficient amount of random traffic, and it needs to continue for a period of time to collect data. When the product traffic is small, it takes a certain time to implement, which is more labor-intensive; in view of the various A/BTests Limitations, how to use the historical data at hand for "causal inference" analysis.

3. Causal inference methods and applicable scenarios

The counterfactual theory used for causal inference in observational research is a state that is contrary to what we can observe in reality (Rubin 1980). Causal inference (Causal inference) is the process of drawing causal relationship conclusions based on the conditions under which the influence occurs, and it is the study of how to identify causality between variables more scientifically. In causality, the cause is partly responsible for the result, and the result partly depends on the cause. Objective things generally have inherent causal connections. People can only understand things in a comprehensive and essential way only by understanding the causes and consequences of the development and change of things. This law of the development of basic objects can sometimes be directly engaged in inferences from the causal relationship of objects themselves when demonstrating viewpoints. This is called causal reasoning. For decades, causal reasoning has been an important research topic in many fields such as statistics, computer science, education, public policy and economics. Overall, the insights of causal inference can help identify user pain points, provide direction for product iteration, and provide a more personalized user experience.

The following figure summarizes the current methodology framework for solving various analysis scenarios:

Here are the applicable scenarios of several methods, the double differential analysis method, and infer the difference in results between the experimental group and the control group before and after the intervention. The typical use case of the method is to promote marketing activities or new product features in a specific city. Compare the result gap between the promoted city and the non-promoted city in the same time period, then this gap value is the true effect of the event.

Another very typical causal inference method is called the breakpoint regression method, which divides a continuous value into segments to see if there is a difference in the outcome variable at the breakpoint. For example, study how different levels of pricing affect users' purchasing decisions.

 

In the propensity score matching method, the product manager displays the user repurchase data after adding a new feature at the meeting. The repurchase rate of the user group using the new function is 20% higher than that of the unused user group. Therefore, I believe that this function has increased the user repurchase rate, and I want to promote this function in the product. As a data analyst, how to evaluate and analyze the effect? The data mainly includes three aspects: the user's characteristic variables, whether to use the function, and whether there is a repurchase. According to the 1:1 matching ratio, 1116 pairs of user data are finally matched. Among them, the user repurchase rate in the treatment group was 24%, and the control group was 13%. The difference between the two groups was significant. Therefore, it is proved from the data point of view that this function can indeed increase the repurchase rate. However, the repurchase rate attributed to the improved functionality should be 11%, not 20%.

Number of matches
1116
Processing group (use function) repurchase rate
0.24
Control group (unused function) repurchase rate 0.13
Mean difference
0.11***

Another example is the synthetic control method and the Bayesian structured time series method. The principle is whether there is a significant change in the result variable time series data before and after the intervention event occurs. This method is called interrupted time series design. This method is usually used for index analysis during product resumption.

Current original articles related to causal inference business applications:

36 data operations (two): how to use synthetic control method to judge the effect of strategy implementation

36 data analysis (7): How does the marketing gain model (uplift model) identify marketing sensitive user groups, implemented in Python

Data operation 36 counts (8): Breakpoint regression (RDD) evaluation product design effect
data analysis 36 counts (9): Propensity score matching method (PSM) quantitative evaluation effect analysis
data analysis 36 counts (12): AB test cannot be done, How to quantitatively evaluate the effects of marketing and product revisions on the business

Guess you like

Origin blog.csdn.net/Pylady/article/details/109699378