Recommended system impact assessment

User research

  • Pros: Users can get a lot of indicators reflect the subjective perception of risk is lower than the online experiment, it is easy to make up after an error occurs.
  • Cons: recruiting test users greater costs; difficult to organize a large-scale test users, lack of statistical significance of the test results. 

 

Online Assessment

Design an online experiment, and then to measure the performance of online recommendation system based on user feedback results. Online assessment, the more important point is two options - online test mode and online assessment indicators.

ABTest Introduction

ABTest is to achieve the same goal to develop two programs, allowing some users to use A program, another part of the user to use Plan B, recording feedback from users of the two parts, and then confirm which option better according to the corresponding evaluation index.
Internet industry, in the course of on-line software fast, ABTest is an experimental method to help us quickly trial and error. Statistically, ABTest is actually a form of testable hypotheses. It helps developers understand the recommended changes to the system is valid, can bring much KPI improvement.
In the preferred system, in order to compare the effects of different algorithms, different sets of data on the final result, by certain rules to the user randomly divided into several groups, and different groups to take a different recommendation algorithms recall or, ultimately through a variety of different groups of users assessment indicators were analyzed.
To give a typical architecture shown in FIG ABTest follows:

It should be noted that, after the user points the barrel and recall of goods, need to be broken up and distributed user barrels, this will ensure that users between different buckets no correlation recall pool ABTest ordering parts and ABTest not associated independently of each other.

 

Online evaluation index

Online evaluation index refers to the index in real business scenarios to assess the quality of the recommendation system. Common indicators include online assessment CTR, conversion rate, GMV and so on.

 

Offline assessment

According to the recommendation system performance to be assessed on the experimental data set, based on some off-line assessment indicators to measure the effect of the recommendation system. Compared to online assessment, offline assess more convenient, more economical, once the data set is selected, simply to be assessed recommendation system to run on this dataset can be. The main part has two off-line assessment: split the data set, select Offline evaluation index.
Split the data set
in the machine learning data sets is generally split into a training data set, the validation data set and test data sets. Their functions are as follows.
Training data set (Train Dataset): used to build machine learning models.
Validation data set (Validation Dataset): helper constructs models for assessing the build process model, the model provides an unbiased estimate of the model parameters and then adjust the super.
Test data set (Test Dataset): evaluation of the performance of the final model of training completed.
Three dataset order model training and evaluation process shown in FIG.

Split the data set mode are:

  • Distillation method
  • K- fold cross validation
  • Bootstrapping

Offline evaluation index

Offline evaluation index for the prediction model on the front line in the entire recommendation system can be achieved. Common off-line evaluation index can be divided into two categories:
  • Accuracy indicators: basic indicators to assess recommendation system, measures the extent to index recommendation algorithm can accurately predict the degree of preference recommended product users can be divided into classification accuracy indicators, forecasting accuracy index score, prediction score indicators association.
  • Meas.Inaccuracy indicators: After the recommended system reaches a certain accuracy, measure recommendation system richness and diversity of.

Wherein the predicted classification accuracy indicators include:

  • AUC
  • Accuracy (Accuracy)
  • Accuracy rate (Precision)
  • Recall (Recall)
  • F-measure值。

Score prediction accuracy specifications include:

  • Mean absolute error (MAE)
  • Mean square error (MSE)
  • Root mean square error (RMSE)

Associated prediction score indicators include:

  • Pearson product-related systems from
  • Spearman's rank correlation coefficient
  • Kendall rank correlation coefficient

Sort prediction accuracy specifications include:

  • Sorting evaluation points

Non-accurate indicators of the degree include:

  • Diversity
  • Novelty
  • Surprise degree
  • Coverage
  • believablity
  • real-time
  • Robustness
  • Business goals

Reference material

https://blog.csdn.net/Gamer_gyt/article/details/96207006

 

Guess you like

Origin www.cnblogs.com/xumaomao/p/11442895.html