What is A/B testing? The interviewer confused me!

What is A/B testing?

A/B testing is to create two (A/B) or multiple versions of a web or app interface or process, and allow groups of visitors (target groups) with the same (similar) composition to randomly access these versions at the same time. version, collect user experience data and business data from each group, and finally analyze and evaluate the best version, which is officially adopted.

A/B testing purpose

  • Eliminate disputes between different opinions in customer experience (UX) design and determine the best solution based on actual results;
  • Through comparative experiments, find the real cause of the problem and improve product design and operation levels;
  • Establish a data-driven, continuous optimization closed-loop process;
  • Through A/B testing, the risk of launching new products or new features is reduced and product innovation is guaranteed.

01 Basic steps of A/B testing

  • Analyze the current situation and establish hypotheses :
    Analyze business data, determine the most critical improvement points at present, and make optimization hypotheses. Make suggestions for optimization
  • Set goals and formulate plans :
    set primary goals to measure the pros and cons of each optimized version; set auxiliary goals to evaluate the impact of optimized versions on other aspects
  • Design and development :
    Produce 2 or more optimized versions of design prototypes and complete technical implementation
  • Distribute traffic :
    Determine the distribution ratio of each online test version. In the initial stage, the traffic setting of the optimization plan can be small, and the traffic can be gradually increased according to the situation.

  • Collect and analyze data :
    Collect experimental data and judge the validity and effect: if the statistical significance reaches 95% or above and maintains for a period of time, the experiment can be ended; if it is below 95%, the test time may need to be extended; if it is very long If the statistical significance over time cannot reach 95% or even 90%, you need to decide whether to terminate the test.
  • Make decisions :
    decide to release a new version based on the test results, adjust the diversion ratio to continue testing, or continue to optimize the iteration plan and re-develop the online test if the test results are not achieved.
North Star Metric, also called One Metric That Matters, refers to the absolute core indicators related to business/strategy in the current stage of the product. Once established, it shines in the sky like a North Star, guiding The team moves in the same direction.

02 A/B testing principles and key points

Principle - Hypothesis Testing

Because the AB test is to test the difference between the sample means from two groups to determine whether the difference between the populations they represent is significant, the test of the difference between the two population means is used. That is, use the statistic Z. For details, see Hypothesis Testing Hypothesis Testing.

key point

  • Target KPI

In A/B testing, we need to set target KPI: it refers to the final indicator to judge the effectiveness of AB testing.

For example: How much to increase the click-through rate or how much to increase the conversion rate.
  • Strategy

In order to achieve the target KPI we set, we need to adopt certain strategies: the differences between the strategies adopted by Group AB.

For example: changing product display pictures, changing copywriting, etc.; generally there are as many tests as there are points of difference;
  • The role of A/B testing

Maximization of the target KPI: Find the optimal strategy for the KPI to ensure that the target KPI is maximized;

Follow-up analysis, precipitation know-how: Because there are differences among the population, by studying the response levels of different sub-populations to different strategies, the strategic preferences of each group of people can be obtained, which will help better personalized innovation and design in the future.

现在我也找了很多测试的朋友,做了一个分享技术的交流群,共享了很多我们收集的技术文档和视频教程。
如果你不想再体验自学时找不到资源,没人解答问题,坚持几天便放弃的感受
可以加入我们一起交流。而且还有很多在自动化,性能,安全,测试开发等等方面有一定建树的技术大牛
分享他们的经验,还会分享很多直播讲座和技术沙龙
可以免费学习!划重点!开源的!!!
qq群号:110685036

 

03 Solutions to common problems in AB testing

01How to distribute traffic

Offline AB testing in the retail industry is generally used to test changes in business indicators brought about by different coupons. Based on the specific setting method of coupons, there are different implementation methods for traffic allocation.

  • Various coupon designs are similar: traffic is equally divided, 4 groups of strategies each have 20%, and the control group has 20%
  • Coupon design is highly uncertain: minimize the test group, 10% test, 90% control
  • The coupon has been used, and is only used to track the effect: a small amount of control group, 10% control, 90% test

Commonly used diversion methods

  • rand function in sql
  • Random ID using mantissa

No matter how the diversion is done, the users used as the control group and the test group must be marked for later analysis and statistics .

02How to determine the minimum number of people for testing

Random fluctuations:

Since our test samples cannot all be exactly the same, the results of the two identical control groups we set may also have different results. This is random fluctuation. Random fluctuations can further affect the results of the test.

Minimum sample size:

In order to make the test results significantly effective while ensuring the minimum cost, we must first ensure that the smallest group in the test group reaches the minimum sample size to verify the effectiveness of the effect. Now we have many websites that can help us calculate the minimum sample size. The website is as follows: A/B test sample size calculation website

  • Ratio target KPI:

Here, the α error of rejecting the true is also called a Type I error, and the β error of rejecting the false is also called a Type II error.

  • Mean target KPI:

If both indicators, the proportion and the mean, need to be compared during the test, choose the one with the larger value.

03How to avoid Simpson’s Paradox

Simpson's Paradox:

Two sets of data under certain conditions will satisfy certain properties when discussed separately, but once considered together, they may lead to opposite conclusions.

Causes of Simpson's Paradox:

Uneven traffic segmentation results in inconsistent user characteristics between the experimental group and the control group.

How to avoid Simpson's Paradox:

  • Carry out reasonable and correct traffic segmentation to ensure that the user characteristics in the experimental group and the control group are consistent and representative, and can represent the overall user characteristics.
  • Experimental design: If two variables have an impact on the experimental results, the two variables should be placed on the same layer for mutually exclusive experiments. Do not let the experimental dynamics of one variable affect the test of the other variable.
For example: If we feel that an experiment may have completely different effects on new and old customers, we should conduct targeted experiments on new customers and old customers respectively and observe the conclusions.
  • Experimental implementation: Actively carry out multi-dimensional segmentation analysis. In addition to overall comparison, also look at the experimental results of segmented audience groups. Do not cover the whole with partiality, and do not cover partiality with the whole.
For example: an experimental version improves the overall activity, but may reduce the activity of young users. So is this experimental version better? A trial version increased total revenue by 0.1%, which seems inconspicuous, but perhaps the purchase rate of young female iPhone users in Shanghai increased by 20%. This trial experience is very valuable.

04 A/B test case

Background of the project

When the company places ads on the website, the first landing page that users see is to access course materials. Now the company has launched a landing page to start a free trial.

  • The current conversion rate averages around 13% throughout the year
  • The target conversion rate reaches 15%.

a) Formulate a hypothesis

  • Null hypothesis: The old version of the landing page is good, but the new version is not useful.
  • Alternative hypothesis: The new landing page is good

b) Develop a plan

Select variable

Control group: see the old page
Experimental group: see the new page

Although we already know the conversion rate of the old design (about 13%), we still need to design two sets of designs to avoid errors caused by other factors, such as seasonal factors and promotional factors. These two groups of people conducted experiments under the condition that all other conditions were the same but the page design was different. This ensured that the differences between the two groups were due to the design.

Determine target KPIs

The conversion rate increased from 13% to 15%. For this reason, we recorded whether the user converted or not.

Determine the number of people to test

Although the more people there are, the more accurate the results will be. We need to use the minimum cost to complete the A/B test and ensure the validity of the results, so we calculate the number of people in the test.

The number of people can be calculated from the above website to be 4523.

Calculate the number of visitors 4720 through python.

We take 4523 here.

c) Distribute traffic

By hashing the userid, the users are divided into two groups: old_page and new_page

d) Collect data

Preparing and collecting data requires cooperation with the engineering team. Here we collected 4523 pieces of data for each group based on the existing data.

Remove duplicate users.

We can find that the web page correspondences of the test combination control group are the same.

Collect data from the experimental and control groups

View data

e) Analyze data

Calculate the conversion rate and standard deviation of the two groups

From the table above, we can see that the conversion rates of the two groups are very close, as shown in the figure below:

The data of our experimental group is better than that of the test group, but does it mean that our experimental group has achieved our goal? This requires us to use hypothesis testing.

hypothetical test

Since the user data of both landing pages are greater than 30, the z statistic is selected.

  • Complete z-test using statsmodels.stats.weightstats

in conclusion

  • p-value = 0.9050582054934148>0.05, so the null hypothesis cannot be rejected, that is, the old landing page is better, then our new landing page will not bring better conversions.
  • The confidence interval shows that the conversion rate of our control group is within the normal level, while the conversion rate of the experimental group is not within the normal level. This means that the conversion rates of our control and experimental groups were very similar, and our new landing page had no improvement.

Finally, I would like to thank everyone who read my article carefully. Looking at the increase in fans and attention, there is always some courtesy. Although it is not a very valuable thing, if you can use it, you can take it directly!

Software Testing Interview Document

We must study to find a high-paying job. The following interview questions are from the latest interview materials from first-tier Internet companies such as Alibaba, Tencent, Byte, etc., and some Byte bosses have given authoritative answers. After finishing this set I believe everyone can find a satisfactory job based on the interview information.
 

Insert image description here

Guess you like

Origin blog.csdn.net/jiangjunsss/article/details/133081307