Important issues that need to be paid attention to when building an AB test platform

Important issues that need to be paid attention to when building an AB test platform

In the design of the AB test platform, we need to pay attention to the following points, so that the AB test ability can be used correctly and better play its due value.
1. Flexible grouping/bucketing
AB tests generally need to group users according to various dimensions, so it is necessary to design a flexible (convenient and fast iteration) and effective (high confidence in the effect evaluation) grouping scheme.
Specifically, users may be grouped according to various dimensions such as randomness, user version, user region, time, channel, age, gender, income, and behavior. The AB test platform must have the capability of multi-dimensional grouping.
2. The AB test must have statistically "significant" confidence. The
AB test has a cost. The purpose of the AB test is to draw correct conclusions to optimize the product experience and increase the revenue conversion, so the AB test index must be improved. If it is statistically "significant", it is true and effective.
There are many statistical methods to verify confidence. I will not go into details here. Interested readers can search for relevant materials by themselves.
3. Consistency of user experience
According to the implementation plan of the AB test described in the previous section, some plans (such as plan 3) have consistent user experience in a certain period (the same day multiple times to enter the page or use the function to see the results It is the same), but some programs (such as the first type of program 2) users may see different results every time they enter the page or use this function. Obviously, the former user experience is consistent, while the latter is inconsistent.
Personal suggestion involves the UI display and interaction, the user will enter/use the function points multiple times, it is better to use the AB test plan with consistent experience. However, services such as advertising are different in different scenarios, and there is no need to adopt a consistent user experience.
4. The test cycle should be long enough
to make the AB test reach a convincing conclusion, the AB test needs to go through a certain cycle to draw more valuable conclusions. Here are some examples to illustrate.
Like the optimization of UI and user interaction, new UI and interaction methods may start to be fresh for users, but after the user’s freshness passes, they may not be so enthusiastic about the feature (it’s like you just found a girlfriend and can’t wait Stay together all the time, but after 2 years, 3 years, or even a few months, you may not think this way). If we only test for a short period of time and find that new features are used more frequently, then if we conclude that the new optimization is better than the old one, we may be deceived by the data. The best practice at this time is to let the AB test run for a long enough period of time to stabilize the results, and then compare the core indicators. The specific selection time needs to be determined according to the industry and experience, and when calculating the core indicators, the initial data can be eliminated to prevent the initial freshness from affecting the final evaluation result.
In addition, like products in some special industries, users behave differently at different times. For example, in the video industry (especially video applications on smart TVs, because multiple people use a product, each person’s time is different. Parents Maybe they have to go to work at ordinary times. Children only have time to watch TV at night, while the elderly have the opportunity to watch TV all day long. The behavior of users on weekends is different from that on weekdays. This is that the AB test cycle cannot be a certain period of time of a day, nor can it be a certain few days, preferably an integer multiple of a week, the conclusions drawn are more reliable.
5. The principle of minimum loss.
Our purpose of AB testing is to optimize the user experience, but it is possible that we believe that effective optimization is not good when it is actually launched, in order to avoid such a situation from negatively affecting user experience and revenue. When doing AB tests, we try to use a small amount of traffic to test new algorithms or optimization points. When the data proves that the optimization points are effective, we will gradually promote it to all users. If the data is not good during the experiment, it will only affect the small number of users tested at most, and will not have a big negative impact.
6. Handle the relationship between AB testing and caching.
Internet companies have adopted a large number of caching technologies to speed up queries and improve the high performance and high availability of the entire system. When doing AB test for a certain functional module, especially the caching situation should be considered, there may be problems at this time.
Here is an example to illustrate that if a user starts with the old algorithm strategy, if the user is assigned a new algorithm strategy during the AB test, if there is a cache, then the user will get the old algorithm strategy from the cache. In fact, the new algorithm strategy assigned by the user is inconsistent.
The solution is to clear the cache and let the request go back to the source when the user's cache is inconsistent with the user's actual allocation strategy. Of course, there can be many specific implementation methods and are related to the specific business and the AB test implementation plan, which will not be described in detail here.
![(https://img-blog.csdnimg.cn/20201113105024690.png?x-oss-process=image/watermark,type_ZmFuZ3poZW5naGVpdGk,shadow_10,text_aHR0cHM6Ly9ibG9nLmNzZG4ubmV0L3dlaX=FF80FF_center_color_pic_FF_picz_mf_center,

Guess you like

Origin blog.csdn.net/weixin_46033259/article/details/109670357