Python data analysis case 21 - Analysis of the impact of link preview on link click rate (A/B test)

The so-called A/B test is actually similar to the controlled experiment in junior high school biology. Group users, use a plan for each group (the plan should follow the univariate premise), and observe the user's response in the same time dimension (reflected in business data and user experience data). It should be noted that the composition of each user group should be as similar as possible, for example, old and new users are likely to show large differences in preferences. Finally, according to the results of the hypothesis test, it is judged which versions are statistically different from the original version, and the version with the best performance is selected according to the effect size.

To put it bluntly, it is a control group experiment, a group of users is not treated, a group of users is treated specially, and then compared to see if there is any significant change in the behavior characteristics of the two groups of users. If the behavioral characteristics become better, it means that the treatment is effective.

The core method is the t-test of two categorical variables in traditional statistics... "AB test" looks mysterious, but it is actually the most basic t-test in statistics.


Case background:

An instant messaging company wanted to A/B test how a change to providing previews of shared links within the app would affect the link's click-through rate. Users were randomly divided into three variants: a control group and two treatment groups.

variant description

There were three different groups in the experiment, two of which were treatment groups and one was a control group. Each group has 10000 users. A description of each variant follows:

  • Process A: Provide a preview of the content and a sharing link.
  • Process B: Provides a preview and thumbnail of the content and a shareable link.
  • Control group: Prints only the shared link itself in plain text.

 

Compare means of user characteristics

Will test whether the following 16 user characteristics differ between either of the two treatment groups. For example, you can compare the age of users between the control group and treatment A, the control group and treatment B, and the treatment A and treatment B. Their characteristic variables are defined in the table below.


control analysis 

import package, read data

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt 
import seaborn as sns       
plt.rcParams ['axes.unicode_minus']=False               
pd.set_option('precision', 4)
#pd.set_option('display.max_columns', 40)
data=pd.read_csv('data.csv',parse_dates=['regDate'])
data.head()

The variable variant in the first column is the group to be processed, with three values ​​of 0, 1, and 2, corresponding to three groups.

View data base information:

data.info(0)

 

 Let's deal with the time variable, which means the user's post-account registration period:

data['regDate']=(data['regDate']-pd.to_datetime('2014-01-01')).map(lambda x:x.days) #Processing time variables

Draw a boxplot comparison of all features in different groups:

dis_cols = 4 ;     dis_rows = len(data.columns)
plt.figure(figsize=(3 * dis_cols, 3 * dis_rows),dpi=256)
for i in range(len(data.columns)-1):
    plt.subplot(dis_rows,dis_cols,i+1)
    sns.boxplot(x='variant',y=data.columns[i+1],width=0.8,orient="v",data=data)
    plt.xlabel(('different groups'),fontsize=8)
    plt.ylabel(data.columns[i+1], fontsize=12)
plt.tight_layout()
plt.show()

 Calculate the mean of different groups

data.groupby('variant').mean()

  Calculate the standard deviation of different groups

data.groupby('variant').std()

 Through the above descriptive statistical analysis, we found that the standard deviation data values ​​of each group are not very different, and the mean values ​​are similar. Therefore, the assumption of equal variances was chosen to be used.


t-test and confidence interval

Import the package and prepare a data frame to store the results:

import statsmodels.stats.api as sms
df_result=pd.DataFrame(columns=['Compare','Feature','t_statistic','p_value','df','CI_low','CI_up'])

Calculate each feature, the t-test results for each two groups:

for c in data.columns[1:]:
    print(c)
    x0 = data[data['variant'] == 0][c]
    x1 = data[data['variant'] == 1][c]
    x2 = data[data['variant'] == 2][c]
    cm01 = sms.CompareMeans(sms.DescrStatsW(x0), sms.DescrStatsW(x1))
    cm02 = sms.CompareMeans(sms.DescrStatsW(x0), sms.DescrStatsW(x2))
    cm12 = sms.CompareMeans(sms.DescrStatsW(x1), sms.DescrStatsW(x2))
    result01=list(cm01.ttest_ind(alternative='two-sided', usevar='pooled'))
    result02=list(cm02.ttest_ind(alternative='two-sided', usevar='pooled'))
    result12=list(cm12.ttest_ind(alternative='two-sided', usevar='pooled'))
    
    ci01=cm01.zconfint_diff(alpha=0.05, alternative='two-sided', usevar='pooled')
    ci02=cm02.zconfint_diff(alpha=0.05, alternative='two-sided', usevar='pooled')
    ci12=cm12.zconfint_diff(alpha=0.05, alternative='two-sided', usevar='pooled')
    
    df_result=df_result.append({'Compare':'Group 0vs.1','Feature':c,'t_statistic':result01[0],'p_value':result01[1],'df':result01[2],'CI_low':ci01[0],'CI_up':ci01[1]},ignore_index=True)
    df_result=df_result.append({'Compare':'Group 0vs.2','Feature':c,'t_statistic':result02[0],'p_value':result02[1],'df':result02[2],'CI_low':ci02[0],'CI_up':ci02[1]},ignore_index=True)
    df_result=df_result.append({'Compare':'Group 1vs.2','Feature':c,'t_statistic':result12[0],'p_value':result12[1],'df':result12[2],'CI_low':ci12[0],'CI_up':ci12[1]},ignore_index=True) 

View the results table:

df_result.groupby(['Feature','Compare']).sum().unstack().loc[data.columns[1:].to_list(),:]    #.style.highlight_between(left=0, right=0.05, subset=['p_value'])

 You can see the above table, which prints out the comparison of each pair of groups, the t statistics of all features, the P value, the degree of freedom df, and the upper and lower bounds of CI.

We check the t test, when the p value is less than 0.05, it shows that the variable between the two groups has a significant difference. Let's find which p-values ​​are less than 0.05:

df=df_result.groupby(['Feature','Compare']).sum().unstack().loc[data.columns[1:].to_list(),:]['p_value']
df.where(df<0.05)

 

 It can be seen that only two variables have p-values ​​less than 0.05. As shown in the table above, when comparing Group 1 and Group 2, the original hypothesis is rejected for only two variables age and followSum. This means that there is a significant difference between the two features Age and followSum in Group 1 and Group 2 at the 0.05 level of significance.

However, age represents the age, and followSum represents the number of people the user followed the day before the experiment. These two variables should not be related to the processing method.

Draw a CI diagram for comparison:

df_CI=df_result.groupby(['Feature','Compare']).sum().unstack().loc[data.columns[1:].to_list(),:][['CI_low','CI_up']].stack().reset_index()
dis_cols = 4 ;     dis_rows = len(data.columns)
plt.figure(figsize=(3 * dis_cols, 3 * dis_rows),dpi=256)
colors=['orange','blue','green']
for f,feature in enumerate(df_CI['Feature'].unique()):
    df_CI_f=df_CI[df_CI['Feature']==feature].set_index('Compare').drop(columns='Feature')
    #print(df_CI_f)
    ax=plt.subplot(dis_rows,dis_cols,f+1)
    for i,c in enumerate(df_CI_f.index):
        plt.plot(df_CI_f.loc[c,:].to_numpy(),(i,i),'o-',color=colors[i],label=c)
    plt.xlabel((f'CI of {feature}'),fontsize=10)
    plt.yticks([])
    plt.legend(loc="upper right")
plt.tight_layout()
plt.show()

 

Consistent with the results of the t-test above, the two characteristic variables of age and following sum were significantly different between the first and second groups.

This result is surprising because the age of the user and the number of people the user followed on the day before the experiment should not be affected by the treatment, which should be due to the lack of guaranteed randomness of the sample sampling.


in conclusion:

In contrast, there is no significant difference in the number of shares, clicks, comments, and likes that should really be affected by the treatment, so I think there is no significant difference in the treatment of these three groups in terms of user behavior. That said, providing previews for app-based shared links doesn't significantly affect the link's click-through rate.

Guess you like

Origin blog.csdn.net/weixin_46277779/article/details/129400239