R language to build RFM model to understand~~~

About the AuthorIntroduction

Du Yu , member of the EasyCharts team, columnist of the R language Chinese community, interested in: Excel business charts, R language data visualization, geographic information data visualization.

Personal public account: Data Little Rubik's Cube (WeChat ID: datamofang), founder of "Data Little Rubik's Cube". 


Highlights

The R language study notes that have been thrown up in those years are all here~

Left hand uses R right hand Python series - the way of tabular data capture

Left-handed R and right-handed Python series - error exception avoidance in loops

Left hand uses R right hand Python series - exception capture and fault tolerance

Left hand uses R right hand Python series - task progress management

Left hand uses R and right hand Python - CSS web page parsing combat

Left hand with R right hand Python series 17 - CSS expression and web page parsing

Left hand uses R right hand Python series - string formatting advanced

R language data analysis notes - Cohort retention analysis

Left hand uses R right hand Python series - string formatting advanced

R language multitasking and parallel computing package - foreach

R language study notes - data processing artifact data.table

ggplot2 study notes - legend system and its adjustment function


The RFM model is an exploratory analysis method often used in marketing and CRM customer management. Through the model, the value law behind customer behavior is deeply explored, and then the data value can be better used to promote business development and customer management.

RFM is an acronym for three types of customer behaviors:

R: Recency - the interval of the customer's last transaction time. The larger the R value is, the longer the customer's transaction has been, and vice versa;
F: Frequency - the number of customer transactions in the recent period. The larger the value of F, the more frequent customer transactions, and vice versa;
M: Monetary - the amount of customer transactions in the recent period. The larger the M value, the higher the customer value, and vice versa.

Generally, by binning the three original indicators of RFM (quantile method), several level factors of each of the three indicators are obtained (it is necessary to pay attention to the corresponding practical significance of the factor levels).

R_S: Calculate the score based on the last transaction date, the closer to the current date, the higher the score, otherwise the lower the score;
F_S: Calculate the score based on the transaction frequency, the higher the transaction frequency, the higher the score, otherwise the lower the score;
M_S : Based on the transaction amount score, the higher the transaction amount, the higher the score, and vice versa.

At the same time, in order to comprehensively evaluate each customer, the above three scores can also be weighted and calculated (the weighting rules can be formulated by experts or determined by marketers according to their own business, and the 100:10:1 is uniformly adopted here).

RFM = 100R_S + 10F_S + 1*M_S

The core of RFM is to build a cube combination of R, F, and M scores to form a very intuitive customer value matrix.

Finally, through the combination of the three indicators of R_S, F_S and M_S, eight types of customer value are formed. Marketers can use the customer groups formed by the above combination to conduct targeted campaign marketing, thereby improving customer value and revenue levels.

Identifying high-quality customers through RFM analysis can be used to formulate personalized communication and marketing services, which can provide better support for marketing decisions.

640?wx_fmt=png

Following are the brief steps to build an RFM model using R language:

1. Data preparation:

## !/user/bin/env RStudio 1.1.423
## -*- coding: utf-8 -*-
## RFM Model

#* 最近一次消费(Recency)      
#* 消费频率(Frenquency)      
#* 消费金额(Monetary)

Code Part

setwd('D:/R/File/')
library('magrittr')
library('dplyr')
library('scales')
library('ggplot2')
library("easyGgplot2")
library("Hmisc")  
library('foreign')
library('lubridate')
mydata <- spss.get("trade.sav",datevars = '交易日期',reencode = 'GBK')
names(mydata) <- c('OrderID','UserID','PayDate','PayAmount')
start_time <- as.POSIXct("2017/01/01", format="%Y/%m/%d") %>%  as.numeric()
end_time <- as.POSIXct("2017/12/31", format="%Y/%m/%d") %>%  as.numeric()
set.seed(233333)
mydata$PayDate <- runif(nrow(mydata),start_time,end_time) %>% as.POSIXct(origin="1970-01-01") %>% as.Date()
mydata$interval <- difftime(max(mydata$PayDate),mydata$PayDate ,units="days") %>% round() %>% as.numeric()

Aggregate transaction frequency, transaction total and first purchase time by user ID

salesRFM <- mydata %>% group_by(UserID) %>% summarise(Monetary = sum(PayAmount), Frequency = n(), Recency = min(interval))

2. Calculate the score

#分箱得分

salesRFM <- mutate(  salesRFM, rankR  = 6- cut(salesRFM$Recency,breaks = quantile(salesRFM$Recency, probs = seq(0, 1, 0.2),names = FALSE),include.lowest = TRUE,labels=F), rankF = cut(salesRFM$Frequency ,breaks = quantile(salesRFM$Frequency, probs = seq(0, 1, 0.2),names = FALSE),include.lowest = TRUE,labels=F),  rankM = cut(salesRFM$Monetary  ,breaks = quantile(salesRFM$Monetary,  probs = seq(0, 1, 0.2),names = FALSE),include.lowest = TRUE,labels=F),  rankRMF = 100*rankR + 10*rankF + 1*rankM)
#标准化得分(也是一种计算得分的方法)

salesRFM <- mutate(salesRFM, rankR1 = 1-rescale(salesRFM$Recency,to = c(0,1)),  rankF1 = rescale(salesRFM$Frequency,to = c(0,1)),  rankM1 = rescale(salesRFM$Monetary,to = c(0,1)),  rankRMF1 = 0.5*rankR + 0.3*rankF + 0.2*rankM)

3. Customer classification:

#对RFM分类:

salesRFM <- within(salesRFM,{R_S = ifelse(rankR > mean(rankR),2,1)
F_S = ifelse(rankF > mean(rankF),2,1)  
M_S = ifelse(rankM > mean(rankM),2,1)})
#客户类型归类:

salesRFM <- within(salesRFM,{Custom = NA  
Custom[R_S == 2 & F_S == 2 & M_S == 2] = '高价值客户'  
Custom[R_S == 1 & F_S == 2 & M_S == 2] = '重点保持客户'  
Custom[R_S == 2 & F_S == 1 & M_S == 2] = '重点发展客户'    
Custom[R_S == 1 & F_S == 1 & M_S == 2] = '重点挽留客户'  
Custom[R_S == 2 & F_S == 2 & M_S == 1] = '重点保护客户'  
Custom[R_S == 1 & F_S == 2 & M_S == 1] = '一般保护客户'  
Custom[R_S == 2 & F_S == 1 & M_S == 1] = '一般发展客户'  
Custom[R_S == 1 & F_S == 1 & M_S == 1] = '潜在客户'
})

640?wx_fmt=jpeg

4. Visualization of analysis results:

4.1 Check the distribution of customers after RFM binning:

#RFM分箱计数

ggplot(salesRFM,aes(rankF)) + geom_bar()+ facet_grid(rankM~rankR) + theme_gray()

640?wx_fmt=jpeg

4.2 RFM heat map:

#RFM heatmap

heatmap_data <- salesRFM %>% group_by(rankF,rankR) %>% dplyr::summarize(M_mean = mean(Monetary))
ggplot(heatmap_data,aes(rankF,rankR,fill =M_mean ))+geom_tile()
+ scale_fill_distiller(palette = 'RdYlGn',direction = 1)

640?wx_fmt=jpeg

4.3 RFM histogram:

#RFM直方图 

p1 <- ggplot(salesRFM,aes(Recency)) + geom_histogram(bins = 10,fill = '#362D4C')
p2 <- ggplot(salesRFM,aes(Frequency)) + geom_histogram(bins = 10,fill = '#362D4C')  
p3 <- ggplot(salesRFM,aes(Monetary)) + geom_histogram(bins = 10,fill = '#362D4C')  
ggplot2.multiplot(p1,p2,p3, cols=3)

640?wx_fmt=jpeg


4.4 RFM pairwise cross scatter plot:

#RFM 两两交叉散点图

p1 <- ggplot(salesRFM,aes(Monetary,Recency)) + geom_point(shape = 21,fill = '#362D4C' ,colour = 'white',size = 2)
p2 <- ggplot(salesRFM,aes(Monetary,Frequency)) + geom_point(shape = 21,fill = '#362D4C' ,colour = 'white',size = 2)  
p3 <- ggplot(salesRFM,aes(Frequency,Recency)) + geom_point(shape = 21,fill = '#362D4C' ,colour = 'white',size = 2)  
ggplot2.multiplot(p1,p2,p3, cols=1)

640?wx_fmt=jpeg

640?wx_fmt=png

5 Data result export

#导出结果数据
write.csv(salesRFM,'salesRFM.csv')



Python:

1. Data preparation

#!/usr/bin/env python3
# -*- coding: utf-8 -*-

import time
import numpy   as np
import pandas  as pd
import savReaderWriter as spss
import os
from  datetime import datetime,timedelta
np.random.seed(233333)
os.chdir('D:/R/File')
pd.set_option('display.float_format', lambda x: '%.3f' % x) with spss.SavReader('trade.sav',returnHeader = True ,ioUtf8=True,rawMode = True,ioLocale='chinese') as reader:    mydata = pd.DataFrame(list(reader)[1:],columns = list(reader)[0])    mydata['交易日期'] = mydata['交易日期'].map(lambda x: reader.spss2strDate(x,"%Y-%m-%d", None))    mydata.rename(columns={'订单ID':'OrderID','客户ID':'UserID','交易日期':'PayDate','交易金额':'PayAmount'},inplace=True)    start_time = int(time.mktime(time.strptime('2017/01/01', '%Y/%m/%d')))    end_time   = int(time.mktime(time.strptime('2017/12/31', '%Y/%m/%d')))    mydata['PayDate'] = pd.Series(np.random.randint(start_time,end_time,len(mydata))).map(lambda x: time.strftime("%Y-%m-%d", time.localtime(x)))    
mydata['interval'] = [(datetime.now() - pd.to_datetime(i,format ='%Y %m %d')).days for i in mydata['PayDate']]    
mydata = mydata.astype({'OrderID':'int64','UserID':'int64','PayAmount':'int64'})    
print('---------#######-----------')    
print(mydata.head())    
print('---------#######-----------')    
print(mydata.tail())  
print('…………………………………………………………………………')          
print(mydata.dtypes)    
print('---------#######------------')

2. Score calculation:

#按照用户ID聚合交易频次、交易总额及首次购买时间

mydata.set_index('UserID', inplace=True)
salesRFM = mydata.groupby(level = 0).agg({
   'PayAmount': np.sum,
    'PayDate':  'count',
    'interval':  np.min     }) # make the column names more meaningful

salesRFM.rename(columns={
   'PayAmount': 'Monetary',
   'PayDate': 'Frequency',
   'interval':'Recency'    }, inplace=True)salesRFM.head()
#均值划分

salesRFM = salesRFM.assign( rankR = pd.qcut(salesRFM['Recency'],  q = [0, .2, .4, .6,.8,1.] , labels = [5,4,3,2,1]),  rankF   = pd.qcut(salesRFM['Frequency'],q = [0, .2, .4, .6,.8,1.] , labels = [1,2,3,4,5]),rankM = pd.qcut(salesRFM['Monetary'] ,q = [0, .2, .4, .6,.8,1.] , labels = [1,2,3,4,5]))salesRFM['rankRMF'] =  100*salesRFM['rankR'] + 10*salesRFM['rankF'] + 1*salesRFM['rankM']
#特征缩放——0-1标准化

from sklearn import preprocessing
min_max_scaler = preprocessing.MinMaxScaler()
salesRFM1 = min_max_scaler.fit_transform(salesRFM.loc[:,['Recency','Frequency','Monetary']].values)
salesRFM = salesRFM.assign(rankR1 = 1 - salesRFM1[:,0], rankF1 = salesRFM1[:,1], rankM1 = salesRFM1[:,2] )salesRFM['rankRFM1'] = 0.5*salesRFM['rankR1'] + 0.3*salesRFM['rankF1'] + 0.2*salesRFM['rankM1']

3. Customer classification:

#对RFM分类:

salesRFM = salesRFM.astype({'rankR':'int64','rankF':'int64','rankM':'int64'})
salesRFM = salesRFM.assign(  R_S = salesRFM['rankR'].map(lambda x: 2 if x > salesRFM['rankR'].mean() else 1), F_S = salesRFM['rankF'].map(lambda x: 2 if x > salesRFM['rankF'].mean() else 1), M_S = salesRFM['rankM'].map(lambda x: 2 if x > salesRFM['rankM'].mean() else 1))
#客户类型归类:

salesRFM['Custom'] = np.NaNsalesRFM.loc[(salesRFM['R_S'] == 2) & (salesRFM['F_S'] == 2) & (salesRFM['M_S'] == 2),'Custom']  = '高价值客户'
salesRFM.loc[(salesRFM['R_S'] == 1) & (salesRFM['F_S'] == 2) & (salesRFM['M_S'] == 2),'Custom']  = '重点保持客户'
salesRFM.loc[(salesRFM['R_S'] == 2) & (salesRFM['F_S'] == 1) & (salesRFM['M_S'] == 2),'Custom']  = '重点发展客户'
salesRFM.loc[(salesRFM['R_S'] == 1) & (salesRFM['F_S'] == 1) & (salesRFM['M_S'] == 2),'Custom']  = '重点挽留客户'
salesRFM.loc[(salesRFM['R_S'] == 2) & (salesRFM['F_S'] == 2) & (salesRFM['M_S'] == 1),'Custom']  = '重点保护客户'  
salesRFM.loc[(salesRFM['R_S'] == 1) & (salesRFM['F_S'] == 2) & (salesRFM['M_S'] == 1),'Custom']  = '一般保护客户'  
salesRFM.loc[(salesRFM['R_S'] == 2) & (salesRFM['F_S'] == 1) & (salesRFM['M_S'] == 1),'Custom']  = '一般发展客户'    
salesRFM.loc[(salesRFM['R_S'] == 1) & (salesRFM['F_S'] == 1) & (salesRFM['M_S'] == 1),'Custom']  = '潜在客户'

640?wx_fmt=png


The RFM model is only a preliminary exploratory analysis. The index results output by the RFM model can also be used for other classification and dimensionality reduction model construction to deeply explore the value of customer data and tap potential marketing points.

Data files and code can be obtained by clicking the following GitHub link:

https://github.com/ljtyduyu/DataWarehouse/tree/master/Model

If you want to learn ggplot2 in depth, but you are too busy with your usual study and work and have no time to study the vast sea of ​​source documents, that's okay. This editor has spent a lot of effort recently, and put my own learning ggplot2 process. Some experiences, learning experiences, and imitation guides have been carefully organized. The video course of R language ggplot2 visualization has been successfully launched, which is exclusively issued by Tianshan Intelligence. I hope this course can bring you more experience in R language data visualization learning. Rich experience.

Recommended courses


Comprehensive system, the most tonal! R language visualization & business chart practical course:

640?wx_fmt=png


640?wx_fmt=gif

Click "Read the original text" to open a new pose

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=325732522&siteId=291194637