E-commerce user value analysis (application of RFM model)

 There are many ways to analyze the value of e-commerce users. Today, let’s talk about the RFM model that is more commonly used by traditional enterprises and e-commerce. Among many customer segmentation models, the RFM model is widely mentioned and used.

1. Why analysis

  1. Internal factors: increase the user’s favorability, because different activities will have different effects, and users’ attitudes are also different;
  2. External factors: To increase competitiveness in the industry, most companies realize that precision marketing is an important magic weapon for customer acquisition;

2. Analysis purpose

According to the user's contribution to the company, customers are divided into three categories: key users, potential users and lost users, and tap the needs of each user group;

Three, analysis means-RFM model

3.1 Noun explanation

R, F, and M respectively represent three words:

  • R (recency): The date of the most recent consumption, the closer the score is, the higher the score;
  • F (frequency): consumption frequency, the larger the score, the higher the score;
  • M (monetary): the amount of consumption, the larger the score, the higher the score;

3.2 Basic principles

 From the company’s point of view, the level of users is also in line with the "28th law", 20% of customers bring 80% of profits, while the remaining 80% of users only bring 20% ​​of profits;
Insert picture description here
 we can further subdivide, Based on the three dimensions of R, F, and M, each dimension is divided into high and low situations, and a three-dimensional coordinate system is constructed. Each small square represents one type of user, that is, 2^3=8 categories:
Insert picture description here
All that remains is to classify users based on three-dimensional scores.

3.3 Scoring rules

 If you just sort by high and low, it is easy to misclassify users near the xy plane, xz plane, and yz plane. In order to achieve a more accurate classification, we first quantify the three values ​​of RFM into 5 intervals, and then compress 5^3=25 user groups to 8. The rules are as follows:

  1. R value score rule: sort the difference from the most recent transaction date from small to large, divide it into 5 levels, and give 5, 4, 3, 2, 1 points in turn;
  2. F worth score rule: sort the user's transaction frequency from large to small, divide it into 5 levels, and give 5, 4, 3, 2, and 1 points in turn;
  3. M worth points rules: sort the user's total transaction amount from large to small, divide it into 5 layers, and give 5, 4, 3, 2, and 1 points in turn;
  4. RFM total score: RFM value = 0.2 R + 0.3 F + 0.5*M, sorted from largest to smallest, the interval is
    [5, 4.5); [4.5, 4); [4, 3.5);
    [3.5, 3); [ 3, 2.5); [2.5, 2);
    [2,1.5); [1.5, 1];
    here, the three weight values ​​of 0.2, 0.3, and 0.5 are not fixed. Under the premise of guaranteeing the sum of 1, they are freely allocated according to specific businesses

3.4 Python code to implement RFM model

1. Related third-party data analysis libraries, among which the datetime library is used to convert the time type:

import pandas as pd
import numpy as np
import csv
import time
from datetime import datetime
#全部行都能输出
from IPython.core.interactiveshell import InteractiveShell
InteractiveShell.ast_node_interactivity = "all"
sale_data = pd.read_csv("D:/data/RFM_E_commerce/rfm_data.csv", encoding = 'gbk')
sale_data.head()

Insert picture description here
2. Check the field type, length, and whether there are missing values. If there are missing values, you need to fill in missing values;

sale_data.dtypes
len(sale_data)
sale_data.isnull().any()

Insert picture description here
3. Considering the user stickiness, the higher the stickiness, the more valuable it is. In order to highlight their value, we do the de-duplication process. When a customer has placed multiple orders in one day, we only record this The customer has placed an order on the same day (not required, please consider whether this processing is necessary according to the actual situation)

sale_data.drop_duplicates(subset = ['buy_time', 'city_code', 'customer_code'], keep = 'first', inplace = True)
len(sale_data)

4. Convert the buy_time field to a time type, and calculate the time interval date_diff from the last consumption to the present

def time_diff(x):
    x=x[0]
    day_diff=(pd.to_datetime('today')-x).days
    return day_diff

sale_data['buy_time']=pd.to_datetime(sale_data['buy_time'])

sale_data['date_diff']=sale_data.apply(time_diff,axis=1)
sale_data.head()

5. Calculate the actual size of the three fields of'Last Consumption Interval','Number of Consumptions', and'Total Consumption'

R_data = sale_data.groupby(['city_code', 'customer_code'])['date_diff']
F_data = sale_data.groupby(['city_code', 'customer_code'])['bill_code']
M_data = sale_data.groupby(['city_code', 'customer_code'])['sale_amt']
R_agg = R_data.agg([('最后一次消费间隔', 'min')])
F_agg = F_data.agg([('消费次数', 'count')])
M_agg = M_data.agg([('消费总额', 'sum')])
rfm = R_agg.join(F_agg).join(M_agg)

rfm

Insert picture description here
6. Divide the three fields of'Last Consumption Interval','Number of Consumptions', and'Total Consumption' into 5 layers according to quintiles, and give corresponding scores;

rfm = rfm.reset_index(drop = False)
bins = rfm['最后一次消费间隔'].quantile(q=np.linspace(0,1,6), interpolation= 'nearest')
bins[0] = 0
labels = [5, 4, 3, 2, 1]
R1 = pd.cut(rfm['最后一次消费间隔'], bins, labels=labels)

bins = rfm['消费次数'].quantile(q=np.linspace(0,1,6), interpolation= 'nearest')
bins[0] = 0
labels = [1, 2, 3, 4, 5]
F1 = pd.cut(rfm['消费次数'], bins, labels=labels)

bins = rfm['消费总额'].quantile(q=np.linspace(0,1,6), interpolation= 'nearest')
bins[0] = 0
labels = [1, 2, 3, 4, 5]
M1 = pd.cut(rfm['消费总额'], bins, labels=labels)

rfm['R1']=R1  
rfm['F1']=F1  
rfm['M1']=M1
rfm.head()

Insert picture description here
7. According to the RFM scoring formula, calculate the total score, divide the corresponding user levels, and count the number of users at each level to understand the overall distribution.

rfm['RFM'] = 0.2*R1.astype(int) + 0.3*F1.astype(int) + 0.5*M1.astype(int)


bins = rfm['RFM'].quantile(q=np.linspace(0,1,9), interpolation= 'nearest')
bins[0] = 0
labels = ['流失用户', '一般维持客户', '一般发展用户', '潜力用户', '重要挽留用户', '重要保持用户', '重要发展客户', '重要价值用户']
rfm['用户分层'] = pd.cut(rfm['RFM'], bins, labels=labels)


rfm=rfm.rename(columns={
    
    '最后一次消费间隔':'last_sale_day','消费次数':'sale_frq','消费总额':'sale_amt','用户分层':'customer_classification',})


rfm_table = rfm.pivot_table(values = 'customer_code', index = 'customer_classification', aggfunc='count')
rfm_result = rfm_table.rename(columns={
    
    'customer_code':'customer_num'}).reset_index()
print(rfm_result)

Insert picture description here

Four, analysis and application

Implement appropriate operating strategies according to the corresponding user types:
Insert picture description here

Five, model thinking

  1. There are two advantages of this model. First, for e-commerce companies, it is easy to obtain the required and accurate data; second, the hierarchical interpretability is strong, and the business is easy to understand.
  2. The disadvantage of the model is that it is not suitable for large electrical appliances, such as air conditioners, refrigerators, TVs and other appliances with a long service life.
  3. Model optimization:
    First: adjust the threshold (three-dimensional weight), adjust the setting of the threshold according to the final divided group of people and related operational effects and activity rules, and finally reach the most reasonable division;
    second: for high value The users are not fair. For example, I am the user with the highest purchasing power on a certain platform, but I enjoy the same service as the 10% below. It is definitely uncomfortable to not enjoy the "Emperor" treatment. I should give the top 10 users VIP service As an operational supplement.

Guess you like

Origin blog.csdn.net/Keeomg/article/details/114987804