O2O Coupon Offline Usage Data Analysis (Python)

1 Project Background

With the improvement and popularization of mobile devices, mobile Internet + all walks of life have entered a stage of rapid development, among which O2O (Online to Offline) consumption is the most eye-catching. According to incomplete statistics, there are at least 10 start-up companies in the O20 industry with a valuation of hundreds of millions, and there are also tens of billions of giants.

The O2O industry is associated with hundreds of millions of consumers, and various APPs record more than 10 billion user behavior and location records every day, so it has become one of the best combination points for big data research and commercial operations. Using coupons to rejuvenate old customers or attract new customers to shop is an important marketing method for O2O. However, randomly placed coupons cause meaningless disturbance to most users. For merchants, spamming coupons may reduce brand reputation, and it is difficult to estimate marketing costs. Personalized delivery is an important technology to improve the coupon verification rate. It can allow consumers with certain preferences to get real benefits, and at the same time give merchants stronger marketing capabilities.

2 Analysis objectives

1. Analyze the influencing factors of whether the store traffic is hot or not

2. Analyze customer spending habits

3. Analyze the usage of the delivered coupons

3 Data introduction

4 Data analysis

4.1 Load data

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
plt.rcParams['font.sans-serif'] = 'SimHei' # 正常显示中文
plt.rcParams['axes.unicode_minus'] = False #正常显示负号

# parse_dates参数表示将列转换为日期格式
offline = pd.read_csv('ccf_offline_stage1_train.csv',parse_dates=['Date_received','Date'])
offline.info()

offline.head(10)

4.2 Data preprocessing

4.2.1 Check the null value

By checking the null values, it is found that the coupon id, discount rate, and coupon consumption date have the same number of null values, so there may be cases where they are NULL at the same time.

4.2.2 Adjust the data format of the "Discount_rate" column 

Convert the full discount form in the data table into the form of discount rate.

#将NaN转换成null方便函数的逻辑判断
offline['Discount_rate'] = offline['Discount_rate'].fillna('null')
# 折扣率转换函数
def discount_rate_func(s):
    if ':'in s:
        split = s.split(':')
        discount_rate = (int(split[0]) - int(split[1])) / int(split[0])
        return round(discount_rate,2)
    elif s == 'null':
        return np.NaN
    else:
        return float(s)

offline['Discount_rate'] = offline['Discount_rate'].map(discount_rate_func)
offline.head(10)

4.2.3 Analysis of Null Value Relationship

Coupon_id represents the coupon id. If it is null, it means that there is no coupon, and the data in the Discount_rate and Date_received columns are meaningless. Corresponds to the above guess that all three are empty at the same time.

Check whether the three are empty or not empty.

From the above inspection results, we know that when there is no coupon, the latter two fields also lose their meaning.

Note: The empty value at this time cannot be deleted casually, because there are still consumptions without coupons. In other words, null values ​​also have their corresponding meanings.

5 Specific analysis

5.1 Analysis of Consumption Situation Using Coupons

There are four situations:

cpon_no_consume = offline[(offline['Date'].isnull() & offline['Coupon_id'].notnull())]
no_cpon_no_consume = offline[(offline['Date'].isnull() & offline['Coupon_id'].isnull())]
no_cpon_consume = offline[(offline['Date'].notnull() & offline['Coupon_id'].isnull())]
cpon_consume = offline[(offline['Date'].notnull() & offline['Coupon_id'].notnull())]

Draw pie chart proportions:

# 将数据合在一起
consume_status = {'cpon_no_consume':len(cpon_no_consume),'no_cpon_consume':len(no_cpon_consume),'cpon_consume':len(cpon_consume)}
consume_status = pd.Series(consume_status)
# fig画布,ax表示坐标
fig,ax = plt.subplots(1,1,figsize=(8,10))
# 
consume_status.plot.pie(ax=ax,
                        autopct='%1.1f%%',
                        shadow=True,
                        explode=[0.02,0.02,0.02],    #分饼间隔
                        textprops={'fontsize':15,'color':'blue'},    #文本属性
                        wedgeprops={'linewidth':1,'edgecolor':'black'},
                        labels=['有券未消费\n({})'.format(len(cpon_no_consume)),
                                '无券消费\n({})'.format(len(no_cpon_consume)),
                                '有券消费\n({})'.format(len(cpon_consume)),]      #添加注释
                       )
ax.set_ylabel('') #去除左边的ylabel 默认为None
ax.set_title('消费占比情况')        # 标题设置
plt.legend(labels=['有券未消费','无券消费','有券消费'])  # 图例设置

Some simple conclusions can be drawn from this. (slightly)

5.2 Analysis of distance and discount rate among consumers with coupons

An average distance of 0 means the distance is less than 500 meters.

It can be concluded that there are approximately 1,431 merchants with coupon-consuming customers who are less than 500 meters away from the merchant.

This gives information about the strength of the discount.

5.3 Merchants with the largest number of shoppers with vouchers

For merchants with more than 500 coupon holders, connect the average distance from customers to the store and the average discount intensity:

5.4 The correlation coefficient between the number of shoppers and the average distance and discount strength

corr(), used to calculate the correlation between columns in DataFrame (Pearson correlation coefficient), the value is between [-1,1].

1 means perfect positive correlation, -1 means perfect negative correlation

It can be concluded that the number of consumers and the distance are negatively correlated with the discount rate. The smaller the distance, the smaller the discount (that is, more discounts), and the more people

Guess you like

Origin blog.csdn.net/qq_42433311/article/details/124040604