What should I do if the AB experiment encounters uneven users? —— vivo game center business practical experience sharing

Author: Vivo Internet Data Analysis Team - Li Bingchao

AB experiment is one of the most efficient verification methods when the business is constantly iterative and updated; however, it is necessary to pay special attention to the problem of "uneven users" when evaluating the effect of AB experiment. If you don't pay attention, the research conclusions produced may be wrong. , bringing great risks to business decisions. Therefore, in response to this problem, our game business, with the help of the stratified sampling (covariate balance algorithm) capability that the Hawking experimental team has achieved, explored a set of "pre-user stratification" schemes based on user stratification logic, and the Hawking experimental platform project team , The version release project team worked together to promote the implementation of the plan and improve the user uniformity of the game business AB experiment. Based on actual application cases, this article will elaborate on the thinking process, implementation principles, and application results of the relevant method models, hoping to help you provide reference and inspiration when solving the problem of user inhomogeneity in your respective fields.

I. Introduction

The business continues to improve through continuous iterative updates. AB experiments are one of the most efficient iterative verification methods. Analysts demonstrate the value of data by researching and optimizing experimental plans and evaluating the results of business experiments. This is also one of the core job responsibilities of a data analyst; this requires that the experimental plan and effect evaluation be highly scientific and accurate, but in actual work, the existence of user inhomogeneity will directly affect the analyst The accuracy of output results, which in turn affects product-related decisions.

In the past few years, the analyst team of the game business has continuously explored and researched solutions to the problem of user unevenness in the AB experiment, and has solved such problems in the game business well. This article first uses the concept and impact of user unevenness as a foreshadowing, and then uses the solution as the main line to explain the practical results of the game analyst team in solving the problem of user unevenness in AB experiments, and looks forward to the future.

2. What is user unevenness

2.1 What is User Inhomogeneity in AB Experiments

Business iteration based on AB experiment logic has a crucial premise: Except for the only variable that changes in the product itself, other relevant factors, especially the characteristics of the users themselves, in the two groups of the experiment are the same, that is, the two groups The distribution of user attributes is completely uniform .

The problem of user inhomogeneity encountered in the business AB experiment refers to the fact that the experimental group and the control group are two groups of people used to evaluate the business effect. Due to the particularity and other reasons, there is a big difference in the prior distribution of the core effect evaluation indicators of the two population groups:

  • [Crowd classification method]: Some businesses directly use mobile phone IDs to classify groups, but mobile phone IDs are related to mobile phone model batches, which do not have sufficient randomness;

  • [Crowd level]: The size of the crowd is too small, sampling cannot guarantee that users with different characteristics are completely random, resulting in uneven distribution of users;

  • [Indicator specificity]: Game payment indicators have the characteristics of high sparsity, non-normal distribution, and non-continuity. Conventional sampling methods are difficult to ensure good uniformity.

As shown in the figure below (different colors represent users with different prior characteristics):

picture

To give another simple example, for example, because of the difference in the distribution of prior characteristics between the experimental group A and the control group B, the prior historical performance of the core indicators of the business is A>B; this leads to the use of two After the personal cluster is tested, in the business effect evaluation, if a certain indicator of crowd A > crowd B, is it because of the improvement brought about by the business strategy or the difference of the historical users themselves? This gets into a business decision dilemma.

2.2 Uneven users in the game center business AB experiment

The scenarios where the AB experiment is used in the game center business iteration are mainly the version iteration grayscale AB experiment, and the center business strategy optimization AB experiment; in the past AB experiment process, both scenarios have encountered the problem of uneven users many times , but the business purposes of the two scenarios are different, so the problem of user unevenness is also different. Below we describe the similarities and differences between the two scenarios in detail.

2.2.1 Uneven users in the version iteration of Game Center

When the game center version is iterated, the main observed indicators are user activity in the center, game downloads and other indicators; the crowd division method used is: use the encrypted tail number of the mobile phone ID to group, this method is in the case of large traffic However, the key feature of version iteration is small traffic and fast iteration, so under small traffic, there will be differences in the number of users with different active download performances in the experimental group and the control group, which will lead to differences between the two groups. Individual groups have unevenness in some core observation indicators, which affects the final decision to scale up the version.

2.2.2 User Inhomogeneity in Game Center Strategy Experiments

The game business is the main revenue-generating business of the company. During the game center strategy experiment, in addition to observing the active and download indicators, it is also necessary to observe the changes in the revenue indicators of the game in the backward direction; as mentioned earlier, the uniformity of the active download indicators is It can be guaranteed, and the traffic during the strategy experiment is generally relatively large, and historical data also proves that under the traffic of the strategy experiment, the uniformity of active distribution indicators can be guaranteed.

However, as a special business model, game revenue is quite different from active users and game downloads, and has the following particularities:

  • [Limited number of paying users]: Among the active users of the game, the number of paying users is limited. Among the randomly selected two groups of users, even if the magnitude of active users is the same, there may be a large difference in the magnitude of paying users, especially among high-paying users. There was a difference in distribution between the two groups.

  • [Income distribution is not normal]: In a certain period of time, the payment distribution of game users is very large, but most users do not pay high, so the difference between a few extreme high-paying users can have a greater impact on the overall income result.

  • [The game payment level is not fixed]: The user's game payment is strongly related to the game, so the user's payment situation is not only related to its own factors, but also related to the user's recent games, that is, the user's payment level is a constantly changing process.

  • [Discontinuous distribution of high payment value]: The definition of high payment users is a concept of range, and the number of high payment users is limited, so the specific high payment value is not continuous, and there will be obvious differences between high payment users. Differences can still have a large impact when put on the whole.

Therefore, even under the large traffic of the central strategy experiment scenario, it is still impossible to guarantee the uniformity of the income index during the experiment. The reasons can be attributed to two aspects:

  1. The distribution of high-paying and low-paying users in the experimental group and the control group is uneven, and there is a big difference in the magnitude of paying users.

  2. The payment value of high-paying users is discontinuously distributed. Even if the paying users are evenly distributed in each group, there is still a certain gap in the payment value.

3. Influence of uneven users

As the company's professional game distribution platform, vivo game center needs to continuously optimize and iterate the game center products in order to better serve game center users. The AB experiment is the main effect verification method. By comparing the core indicators of business concern, the optimal function or version is selected for full production. However, the problem of uneven users will have a relatively large impact on the closed-loop of the entire process, mainly including the following aspects:

1. Affect the interpretability of business data, leading to deviations in the conclusion of business effects

It can be seen from the table below that in some business strategy experiments in the history of Game Center, the strategy has no direct impact on the income index, but the fluctuation of the entire income index is more than 10%. In this case, it is completely impossible to evaluate based on the experimental income data The impact of business strategy on revenue.

arpu=active users during the experiment period paid after game download/active users

picture

2. Bringing the risk of wrong business strategies

The main risk here is that strategies with negative effects are scaled up, and strategies with positive effects cannot be scaled up in time.

3. It leads to high inefficiency in grayscale publishing, and a lot of manpower is wasted in abnormal troubleshooting

In the grayscale version release of the game center, 8-10 version exceptions a year are caused by uneven users, and a single exception inspection requires a total of 5 man-days from all parties . Person-day +/year .

Therefore, how to scientifically and rationally solve the problem of user unevenness in the evaluation of the game center's AB experiment effect is of great significance to the development of the entire game center's business effect evaluation.

4. How to deal with the problem of uneven users

The problem of user unevenness in AB experiments is a problem that data analysts have always faced when evaluating experimental effects; Colleague research tried a number of different solutions:

From the comparison of the above several schemes, it can be seen that the "pre-user stratification model" based on user stratification logic is the most scientific, reasonable and stable solution at this stage.

5. Introduction to Game Business Solutions——"Pre-User Hierarchy" Model

This part mainly introduces the "pre-user stratification" model, including the design of the model, product implementation, and the actual application effect in the game center business, so that everyone can intuitively understand the logic and effect of the model.

5.1 Introduction to Hierarchical Models

As mentioned in the previous introduction to user unevenness, although the AB experiments of the central version grayscale scene and the central policy optimization scene both face the problem of user unevenness, the unevenness problems faced by the two scenarios are different; so we focus on these two Scenario, based on the user stratification logic, a hierarchical model for distribution indicators and income indicators is built respectively. When sampling the experimental population, the same number of users are drawn from different user strata into the experimental group and the control group, in order to solve the problem of users in the business effect evaluation. uneven problem.

Conventional stratified sampling logic : Assuming that the number of active users in the market is N, the number of active users in the i-th layer after stratification is Ni, and the traffic sampled in each group during the experiment is n, then the sampling level of the i-th layer in the experimental group should be:

5.1.1 User Game Revenue Stratification Model

During the central strategy AB experiment, among the payment indicators that the business core pays attention to , the active user arpu is mainly affected by the unevenness of users. Therefore, according to the definition of this indicator, some intermediate variables are selected as the basis for user stratification, and then according to these indicators in the central active user The performance in the broader market is first grouped according to a single indicator, and then multiple indicators are cross-combined to form the final layering scheme.

picture

5.1.2 User Distribution Hierarchical Model

Aiming at the distribution indicators that are concerned by the business core and affected by user unevenness during the central version grayscale experiment, some intermediate variables are selected as the basis for user stratification; and then the final stratification scheme is formed using the same method as the income stratification model.

picture

5.2 Product realization of layered model

After building the user hierarchical model on the data side, if we want to rely on the hierarchical model for split sampling in experiments and gray scale, we need to rely on the functions of the product platform; so we cooperated with Hawking's experimental platform and version release system, and Hawking Colleagues with the version release system developed related functions and connected the user hierarchical model we built to the Hawking Experimental Platform and the version release system respectively, so as to realize the splitting of users in the experiment and version grayscale based on the user hierarchical logic, ensuring that the experiment and version The user uniformity among various groups of people in the gray scale improves the scientificity and accuracy of the backward effect evaluation.

The specific triage logic schematic diagram is as follows: (the four different colors in the figure represent different feature stratified populations)

picture

For details on how to implement specific product platform functions, see: Reference [2]

5.3 Test effect of "pre-user stratification"

After the relevant functions of the Hawking experimental platform and the version release system were launched, the data analysis side carried out the AA experiment of the corresponding platform to verify whether the effect of the user stratification logic on the problem of user unevenness reached the expected level.

5.3.1 AA test conclusion of the Hawking experimental platform

The user stratification model can greatly improve the uniformity of the experimental income data of the center without affecting the uniformity of the original distribution indicators.

  • Distribution uniformity: Under the two distribution logics, the distribution-related index fluctuations are not significant , but the absolute value of the index fluctuations is much smaller than the hash distribution logic .

  • Under the hash grouping logic, the income arpu1 fluctuated by 11.6%; but under the user stratified sampling logic, the income arpu1 fluctuations of the two experimental groups were 4.8% and 1.9% , respectively , and the income arpu2 fluctuations were 3.3% and 1.5% , respectively , and the uniformity was greatly improved.

Remarks: The relative change value is calculated by comparing the income index and the activity-related index; the absolute change value is calculated by the comparison of the distribution index; income arpu1 and 2 represent different income calculation logics.

5.3.2 AA test conclusion of version release system

In terms of the uniformity of distribution indicators, the user hierarchical model is superior to the original mobile phone identification encryption tail number shunting method.

  • Uniformity of distribution: Under the logic of user stratification, each distribution index fluctuates not significantly; but under the logic of splitting mobile phone ID encryption tail numbers, the game distribution related index E changes significantly, that is, there is user unevenness in this index .

picture

5.4 Benefits from function launch

After the "pre-user stratification" model is launched and used in the game center business experiment and grayscale release, it can bring significant benefits in the following aspects:

  1. [Grayscale effective release rate significantly increased by 9pt]: After the layered logic is launched, the probability of effective grayscale release (uniform user) in the game center increases from 86% to 95%, and the number of uneven releases decreases from 10 times per year. 2~3 times/year (only 1 time in the past six months).

  2. [Save 35 man-days/year for exception troubleshooting]: Release abnormalities are reduced by 7 times/year, and a single exception troubleshooting requires a total of 5 man-days from all parties, saving a total of 35 man-days/year for version exception troubleshooting.

  3. [Forward strategy experiment will bring the center's annual income +0.2% in advance]: After the user stratification logic is launched, the forward strategy will reach a full conclusion in advance, which can bring the game center's annual game distribution +0.1%, and annual game revenue +0.2 %.

  4. [Negative strategy experiment goes offline in time to reduce income loss]: Improve the timeliness and accuracy of judgment when the experiment is negative, and reduce the income loss caused by long-term observation of negative experiment, accounting for about 0.1% of the center's annual income.

6. Summary and Outlook

For the problem of uneven users in the AB experiment, we drew on past experience, through continuous trials and explorations, developed a "pre-user stratification" model based on the logic of user stratification, and worked with the Hawking project team and the version release system project team. With strong support, differentiated processing of different scenarios has achieved good results in solving the problem of user unevenness in the game center AB experiment; Unevenness; however, in the central strategy experiment, due to the particularity of game revenue data, the user stratification scheme can solve the problem of uneven distribution of high-paying and low-paying users in the experimental group; but it cannot completely solve the problem of discontinuous high-paying value Therefore, the income fluctuation is still at 1%~2%, but it is far lower than the income fluctuation range under the original diversion method.

In addition, the "pre-user stratification" scheme adopted at this stage can greatly increase the probability of user uniformity, but cannot completely eliminate the problem of user unevenness; However, the logic of artificial stratification is relatively subjective; on the other hand, the reason is that there are few indicators selected, and the dependent information is not comprehensive enough; we will continue to try to explore in the future, involving more indicator information, and at the same time use machine learning, etc. The model is applied to the construction of the user hierarchical system in order to further solve the problem of user uniformity in the game business.

Finally, I hope this article can provide reference and inspiration for different businesses to solve the problem of uneven users in AB experiments.

references:

  1. Mao Shisong, Wang Jinglong, Pu Xiaolong. "Advanced Mathematical Statistics (Second Edition)"

  2. Vivo Internet Technology " Design and Practice of Vivo Hawking Experimental Platform-Platform Product Series 02 "

Guess you like

Origin blog.csdn.net/vivo_tech/article/details/132077561