Customer portrait

Customer Portrait Step
1: Variable selection

  • Consolidate data, assuming more than 800 variables
  • Eliminate unavailable variables
  • Variable clustering / correlation analysis, combined with business focus to select variables
  • Initially test clustering and remove variables
    that have no effect on clustering – assuming there are 50 variables at this time, then firstly perform hierarchical clustering on these 50 variables, and then determine the most stable clustering (k) based on the rand coefficient , Assuming that we find that k = 10 is the most stable, then it means that the variables are divided into 10 categories, and finally the 50 variables assigned to these 10 categories are pruned. Usually, we can use the first of each category in these 10 categories. A variable to represent the class, thus obtaining 10 variables, to achieve the purpose of dimensionality reduction-the
    above is to use hierarchical clustering to reduce the dimensionality of the variable, in addition to this method, if the case does not require the original variable, we Principal component analysis can also be used to achieve dimensionality reduction. By stretching 50 variables, new variables related to variables are constructed, and then the variables that can best explain 80% of the original model are selected as new variables, and the remaining only are discarded. Can explain 20% of the variables and also achieve dimensionality reduction
  1. The number of groups (k) is determined
  • According to the project and business needs, initially determine the value of k (3-8), use the K-Means algorithm for grouping
  • Set up clusters of 3-8 clusters and calculate related statistics (R ^ 2), and select a reasonable range of clusters based on the statistics (5-6)
  • Comparing the clusters (aggregation goodness or profile coefficient) through key indicators, it is found that some groups are relatively similar and the proportion of groups is very small when 6-7 groups are used. We believe that only 5 groups need to be selected
  1. Quantify each group quantitatively
  • Choose the dimension needed to draw the customer and calculate its mean / distribution across groups
  • Based on these averages / distributions and other groups or overall indicators, understand the characteristics of the group
Published 69 original articles · praised 11 · 20,000+ views

Guess you like

Origin blog.csdn.net/weixin_41636030/article/details/94361537