The BAT boss takes you to understand the AB test

about the author

@Sven

BAT data messenger expert,

Have done user growth,

Currently responsible for the entire data link.


Since the development of the Internet, the demographic dividend has gradually disappeared. As of the end of 2020, China's Internet users have reached 1 billion people, which already accounts for a high proportion of the Chinese population. Each track has subdivisions and vertical categories. Products are also sinking continuously, reaching the fourth and fifth line of users. The crowd is constantly being subdivided, with new users, active users, and silent return users. These are all refined withdrawals.


The goal of refinement is to grow; and if you want better growth, you need to serve users better; to serve users well, you need to make corresponding products that meet the needs of different users and issue different operating strategies; And how to better quantify whether users love the product iteration and whether the operation strategy is effective requires a more scientific method. These will be applied to the AB test. In order to inform the business side whether these iterations are good or bad through mathematical methods in the iterative process of many functions and strategies, and what is the direction of continued development.



01    What is the AB test


The AB test itself originated in medicine. When a drug is developed, medical staff need to confirm whether its effect is really effective. People in the experimental group and the control group will be required to "test the drug" and use the results of the "test drug" to evaluate the effect of the drug.


The specific method is that a group of homogeneous users are randomly divided into two groups. Without knowing whether it is a medicine, get the test medicine and placebo. After a period of experimentation and observation, compare the experimental data of the two groups of patients to see if there are significant differences, so as to determine whether the test medication is effective. This is the "double-blind experiment" of medicine.

The same is true in the Internet industry.


The business splits the web or app interface or process into multiple versions. Then the traffic is stratified (or divided), and different groups of people use a certain function or trigger different strategies. But the people here must meet the characteristics of homogeneity. Therefore, regardless of stratification or distribution, we need to assign users randomly, and the same user cannot be in two groups.


In the early stage of the experiment, we need to plan and calculate the overall experiment (for example, under the minimum observable change, calculate the minimum sample size, pass the minimum sample size, calculate the experimental period, etc.), and collect relevant information after the experimental flow reaches the statistical minimum sample size According to the data, the data conclusion is obtained through the method of mathematical inspection, and combined with the business evaluation, whether there is a significant difference in the experiment, it is determined whether the function/strategy iteration should be expanded.


To sum up, the AB test is a method of using mathematics, combining with business evaluation goals, and using the experimental method of traffic segmentation to evaluate whether the business iteration is effective.



02    Basic knowledge points of AB testing


• Flow segmentation: diversion or stratification

Diversion: Users do not cross between experiment and experiment, and directly cut part of the traffic for experiment.

This approach is not recommended, except for some strategies that have cross-effects (such as the issuance of coupons).


Because the traffic of a product is limited, if we directly cut a part of the traffic to do a certain experiment, what if there are a lot of experiments?

And this way of segmentation, once the amount of experimentation increases, the sample size will decrease, and the experiment cycle will increase. While the entire Internet is rapidly iterating and working frantically overtime, it may take a month or more to do an experiment, and the leaders may not be happy.


Layering: crossover between experiment and experiment, users data multiple experiments at the same time

This method puts users in multiple patterns at the same time, that is, the vertical direction is the experiment, and the horizontal direction is the experiment bucket. Every time, users are randomly scattered again. When our experiment and experiment will not affect each other, it is recommended to use a hierarchical traffic segmentation scheme. Because this method can make each experiment a full flow experiment, greatly reducing our experiment cycle.


• Sample size calculation

The minimum sample size calculation is to calculate the experimental period. I want to know how many samples we need under expected conditions to observe the smallest observable fluctuation we can accept. You can look at statistics for the underlying principle here. The sample size calculations are all calculated based on the promotion ratio and confidence ratio in statistics. There are a large number of sample size calculators on the Internet. Attached website: https://www.eyeofcloud.com/124.html.


• hypothetical test

First put forward a hypothesis for the experimental conclusion, and then use the result data after the experiment to judge whether the hypothesis is true. In hypothesis testing, you also need to understand how to set up hypotheses and how to test them.


• Two assumptions

Null hypothesis H0: The hypothesis that I want to oppose in the experiment. Generally speaking, the experiment has no effect.

Alternative hypothesis H1: The hypothesis that you want to support in the experiment. Generally speaking, it means that the experiment is effective.


• Inspection metrics

P value: P value is probability, which reflects the probability of occurrence of an event. Statistically, 5% is a small probability event. Therefore, the experimental results are generally considered to be statistically different with P<0.05.

Distribution test: generally greater than or equal to 1.96, there will be a difference

Z test: Z test is often used to test the significance of the overall normal distribution, known variance, or the significance of the mean of an independent large sample and the significance of the difference. Generally speaking, indicators are probabilistic indicators, which can be tested with Z.

T test: t test is often used for the significance test of the population normal distribution, the population variance unknown or the mean of independent small samples, and the significance test of the mean difference. Generally speaking, the index is a mean type index, which can be tested with T


• Confidence interval

The confidence interval is the range of the sample mean for interval estimation of the population parameters of a sample. It shows the probability that this mean range contains the overall parameter, and this probability is called the confidence level. Generally speaking, the same positive or the same negative can judge whether the effect is significant. The confidence level in the confidence interval represents the reliability of the estimate. Generally speaking, the 95% confidence level is used for interval estimation.


03

AB test overall process design

Let's take a look at the overall AB test process:


• Before the experiment: set goals and determine the experimental design

Before starting the experiment, we must first consider the purpose of the experiment and map the purpose to specific calculable indicators. According to this indicator, we need to calculate the corresponding minimum sample size. Then combine the corresponding traffic bucket data to calculate the corresponding experiment period.


• In the experiment: rules and crowd verification

For users, it must be ensured that a user is in only one bucket. For the strategy, we need to verify whether the strategy is effective and whether the traffic ratio meets expectations. If there is any problem, it needs to be revised in time.


• After the experiment: effect evaluation and precipitation review

Before the experiment is completed, it is not recommended to check the validity of the data, as multiple checks lead to an increase in the probability of error.


After the experiment is completed, we will collect data again to verify the significance of the correlation. And not only depends on the saliency of the data, but also confirms the salience of the business. Then decide whether the current experimental strategy should be fully online.


Finally, we also need to settle the problems and good points in the process of this experiment, accumulate experience for the next experiment, and also settle the relevant methodology for our later (if we want to) build an AB test platform.


04    What are the pits in the AB test


• Network effect

Social products or products with C and B need to pay attention to network effects. Because users are connected with each other and will affect each other, we can experiment with users in cities. For example, Shanghai implements strategy A and Beijing implements strategy B to avoid mutual influence on both ends.


• Novelty effect

When we initially launched a function, some users would be curious about the function, which made the data of the function better. But it may just be that the user is more curious. For example, if we change the previous weak reminder button to a strong reminder button, the data will definitely increase. In this case, you can only view NU users, because NU is not affected by historical baggage.


• Time & sample homogeneity

Ensure that the experimental results are observed within the same time period.

Ensure the homogeneity of the sample and avoid uneven distribution.


• The duration of the experiment should be longer than the period between active households

When doing an AB test, try to set an experiment validity period. This period is generally an active interval for users to ensure that all users are reached by the strategy.




The private place of a data person is a big family that helps the data person grow, helping partners who are interested in data to clarify the learning direction and accurately improve their skills. Follow me and take you to explore the magical mysteries of data


1. Go back to "Data Products" and get <Interview Questions for Data Products from Big Factory>

2. Go back to "Data Center" and get <Dachang Data Center Information>

3. Go back to "Business Analysis" and get <Dachang Business Analysis Interview Questions>;

4. Go back to "make friends", join the exchange group, and get to know more data partners.


Guess you like

Origin blog.51cto.com/13526224/2662140