Trading online retail business - - python practical data analysis platform for electronic business and retail data analysis (with source code)

Electronic business platform retail sales data analysis

In front of the blog has been using an online retail business data, data analysis, but in this one, we re-analyze these data at different angles.

Data source and data structures

Foreign trade data online retail business, data Download

Now explain in tabular form at the inside of the field:

Field Explanation
InvoiceNo Order number, contains six integers, order number beginning with the letter C Return
StockCode Product Code, is composed of five integers
Description product description
Quantity The number of products, there is a negative sign indicates return
InvoiceDate Order Date and Time
UnitPrice Prices are quoted in pounds per unit of product
CustomerID Customer number by 5 digits
Country Name name of the country where each client countries / regions

Objective

  • User Classification (RFM model), comparative analysis of different user groups in the dimension of time, areas such as trading volume, transaction amount indicators, and make recommendations based on the analysis results optimization
  • R: last time consumption (the consumer to last a reference time length)
  • F: frequency of consumption (consumption per unit of time how many times)
  • M: Amount of consumption (total consumption amount per unit time)

First talk about ideas:
1, washing of data, a single data back to remove outliers and
2, in accordance with the RFM model equation, the calculated values of the respective module
3, the three modules are divided according to the importance of the model by dividing the data.
4, the custom function will sort the model
, the model for display of the divided region of 5 bar graphs and pie charts.

Let's look at the code, which has detailed code comments:

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import plotly as py
import  plotly.graph_objects as go
import seaborn as sns

sns.set_style('darkgrid')
data = pd.read_csv('data/data.csv')
# print(data.head(10))

#进行数据清洗
# print(data.info())
#从输出信息上来看,CustomerID 发现大量缺失,Description发现少量缺失

#去除掉重复值的值
data = data.drop_duplicates()
#通过描述信息查看是否出现异常值
print(data.describe())
#通过结果发现,UnitPrice出现负值,我们只需要正值  并且 退货数据必须剔除掉
data=data[(data['UnitPrice']>0)&(data['Quantity']>0)]

#接下来将缺失的CustomerID归为一类 代号U
data['CustomerID'].fillna('U',inplace=True)

#统计最近一次消费时间【-----


#时间格式转化成datatime 这里我们暂时不需要具体的时间,所以只留下年月日
data['Data'] = [i[0] for i in data['InvoiceDate'].str.split(' ')]

data['InvoiceDate'] = pd.to_datetime(data['Data'],errors='coerce')
#增加三列 分别记录年、月、日
data['Year'] = data['InvoiceDate'].dt.year
data['Month'] = data['InvoiceDate'].dt.month
data['Day'] = data['InvoiceDate'].dt.day

#我们以所有订单的最新日期为参考时间
Customerdata =data['InvoiceDate'].max()- data.groupby('CustomerID')['InvoiceDate'].max()
R_Customer = Customerdata.dt.days

# print(Customerday.describe())#统计均值 标准差 最大最小值
plt.hist(R_Customer,bins=30)
plt.title("统计最近一次消费时间")
# plt.show()
#---

#统计消费的频次  也就是每个客户购买的订单数
F_Custumer = data.groupby('CustomerID')['InvoiceNo'].nunique()

#统计消费的金额
data['Amount'] = data['UnitPrice']*data['Quantity']
M_Custumer = data.groupby('CustomerID')['Amount'].sum()

#好了 我们需要统计的都统计完了 接下来就进行分组和数据可视化
R_bins = [0,30,90,180,360,720]
F_bins = [1,2,5,10,20,5000]
M_bins = [0,500,2000,5000,10000,200000]
R_score = pd.cut(R_Customer,R_bins,labels=[5,4,3,2,1],right=False)#right=False 所划分区间是左闭右开
F_score = pd.cut(F_Custumer,F_bins,labels=[1,2,3,4,5],right=False)
M_score = pd.cut(M_Custumer,M_bins,labels=[1,2,3,4,5],right=False)

rfm = pd.concat([R_score,F_score,M_score],axis=1)
#从结果中看到,列名默认是采用原数据自带的列名  这里我们做一下修改
rfm.rename(columns={'InvoiceDate':'R_score','InvoiceNo':'F_score','Amount':'M_score'},inplace=True)

rfm = rfm.astype(float)#z转换数据类型,方便之后进行计算

print(rfm.describe())# 通过查看平均值来对用户进行分级
#R_scpre-mean():3.82
#F_score-mean():2.02
#M_score-mean():1.88

rfm['R_score'] = np.where(rfm['R_score']>3.82,'高','低')
rfm['F_score'] = np.where(rfm['F_score']>2.02,'高','低')
rfm['M_score'] = np.where(rfm['M_score']>1.88,'高','低')

#将这三个拼接到一块
rfm['All'] = rfm['R_score'].str[:]+rfm['F_score'].str[:]+rfm['M_score'].str[:]
rfm['All'] = rfm['All'].str.strip()

def CheckClass(x):
    if(x=='高高高'):
        return '重要价值客户'
    elif x=='高低高':
        return '重要发展客户'
    elif x=='高高低':
        return '一般价值用户'
    elif x=='高低低':
        return '一般发展客户'
    elif x=='低高高':
        return '重要保持客户'
    elif x=='低高低':
        return '重要发展客户'
    elif x=='低低高':
        return '重要挽留客户'
    elif x=='低低低':
        return '一般挽留客户'

rfm['用户等级'] = rfm['All'].apply(CheckClass)

#用户等级数量可视化-
Bar = go.Bar(x=rfm['用户等级'].value_counts().index,y=rfm['用户等级'].value_counts(),opacity=0.5,marker=dict(color='orange'))
layout = go.Layout(title = "不同用户等级柱状图")
fig = go.Figure(data=[Bar],layout=layout)
py.offline.plot(fig,filename='CustomerNumber.html')

#将用户等级可视化-饼状图
Pie = go.Pie(labels=rfm['用户等级'].value_counts().index,values=rfm['用户等级'].value_counts())
layout = go.Layout(title = "不同用户等级")
fig = go.Figure(data=[Pie],layout=layout)
py.offline.plot(fig,filename='CustomerClass.html')



FIG outputs two results:
Here Insert Picture Description

Here Insert Picture Description

Conclusion and summary

The practice is mainly portraits brief description of the user, the user stratification. We can see from the chart, the most important value customers and the development of important customers accounted for the former, because of its relatively long interval of time to buy, you can take the appropriate discount coupons or efforts to increase the frequency of this user buy-back; for later customers who, in addition to buying long time interval, there is also the problem of relatively low frequency of purchase for these customers, the same can plan appropriate activities. This data can also be from another perspective, such as orders statistics by month and compare data over the same period last year, to see if major fluctuations occur and so on.

Published 67 original articles · won praise 54 · Views 230,000 +

Guess you like

Origin blog.csdn.net/lzx159951/article/details/104455142