Python implements LRFM model to analyze customer value

1. Analysis background

This is a sales data of an e-commerce platform. The data includes sales data from April 22, 2010 to July 24, 2014. Analyzing the sales data can discover customer value.

Now use KMeans clustering to realize the LRFM model to analyze the value of customers, facilitate customer grouping, targeted promotion, and increase sales.

LRFM model definition:

  • L: The time interval between the member creation date and July 25, 2014 (unit: month)

  • R: The time interval between the last purchase by the member and July 25, 2014 (unit: month)

  • F: Number of member purchases

  • M: total purchase amount of the member

2. Analysis process

image

3. Data Exploration

3.1 Import related packages and read data

import numpy as np
import pandas as pd 
import matplotlib.pyplot as plt
from sklearn import preprocessing
from datetime import datetime
from sklearn.cluster import KMeans

plt.rcParams['font.sans-serif'] = 'SimHei'
%matplotlib inline

# 读取数据
df = pd.read_csv(r'C:/Users/Administrator/Desktop/RFM分析1.csv',
                engine='python')
# 查看行列                
df.shape

Output:

3.2 View table structure

image.pngIt can be seen from the figure that only class2 has missing values ​​in the data here. There is no need to extract this indicator for the time being, and we will not clean it.


3.3 Descriptive analysis view

image.pngIf the sales amount is negative, these outliers must be filtered during data cleaning.


4. Data cleaning

4.1 Filter out sales <0

# 销售金额有小于等于0的,直接过滤掉
# 这里有22542条数据
data = df[df['销售金额'] >0]
data.shape

Output:

4.2 Conversion of member creation date and sales date to datetime format

data['会员创建日期'] = pd.to_datetime(data['会员创建日期'])
data['销售日期'] = pd.to_datetime(data['销售日期'])

# 查看是否转换成功
data.info()

Output:

5. Construct L, R, F, M indicators

5.1 Extract useful indicators

  • L = relative date (here I specify: July 25, 2014)-member creation date

  • R = relative date (here I specify: July 25, 2014)-the latest (largest) sale date

  • F = the number of purchases by the user (the serial number is different here)

  • M = the aggregate amount purchased by the user to buy on
    behalf of:

# 计算L,再转换成月,这里转换成月,直接除于30天,保留两位小数
# L是最早的购买日期距离会员创建日期
data1 = data.groupby('UseId').agg({'会员创建日期': ['min'],
                                    '销售日期': ['min''max'], 
                                   '销售金额':['sum'],
                                  '流水号':['nunique']})
data1

Output:

image
Delete a layer of column names and rename them:

# 删除第一层的列名
data1.columns = [col[1] for col in data1.columns]
# 重新命名列名
data1.columns = ['会员创建日期''最早销售日期''最晚销售日期''M''F']
data1

Output:

image
The M and F indicators have been constructed.

5.2 Purchase L and R indicators

# 先计算L,R,再转化成单位月
data1['L'] = datetime.strptime('2014-7-25''%Y-%m-%d') - data1['会员创建日期']
data1['R'] = datetime.strptime('2014-7-25''%Y-%m-%d') - data1['最晚销售日期']

# 将L、R转换成月做为单位
data1['L'] = data1['L'].apply(lambda x: round(x.days/30,3))
data1['R'] = data1['R'].apply(lambda x: round(x.days/30,3))
data1

Output result:


Extract useful indicators:

LRFM_data = data1[['L''R''F''M']]

6. Perform Z-Score conversion of L, R, F, M data

ss = preprocessing.StandardScaler()
ss_LRFM_data = ss.fit_transform(LRFM_data)
ss_LRFM_data

Output:

image

7. Use KMeans for cluster analysis

# n_clusters聚类的个数
kmodel = KMeans(n_clusters=5, n_jobs=4)
kmodel.fit(ss_LRFM_data)
#查看聚类中心
kmodel.cluster_centers_ 

Output:

image
Convert the result into a DataFrame

client_level = pd.DataFrame(kmodel.cluster_centers_, 
                            index=['客户群1''客户群2''客户群3''客户群4''客户群5'],
                           columns=['L''R''F''M'])
client_level

Output:

image

8. Categorize customer groups based on the results

  • The larger the L is, the longer the time for registered members is from the specified time (July 25, 2014), and the older customers. The larger the indicator, the better.

  • The smaller the R is, the shorter the purchase time is from the specified time (July 25, 2014), the smaller the R, the better.

  • The larger the F, the more purchases the member makes.

  • The larger the M, the more the amount purchased on behalf of the member.

Customer group 1 analysis:
L is large, R is small, F is large, and M is large. The judgment here is an important development customer.

Customer group 2 analysis:
L is large, R is large, F is small, and M is small. The judgment here is important to retain customers.

Customer group 3 analysis:
L is small, R is small, F is small, M is small, here it is judged that it is a low-value customer.

Customer group 4 analysis:
L is large, R is large, F is small, and M is small. It is judged that they are general value customers.

Customer group 5 analysis:
L is large, R is small, F is large, and M is large. The judgment here is important to keep customers.


Guess you like

Origin blog.51cto.com/15064638/2598044