MySQL Project - Visual Analysis of Taobao User Shopping Behavior Data

1. Project background and purpose

1.1 Project Background

        UserBehavior is a Taobao user behavior dataset provided by Alibaba, which is used for the research of implicit feedback recommendation problems. The data set contains all behaviors of about one million random users with behaviors (behaviors include clicks, purchases, additional purchases, and likes) between November 25, 2017 and December 3, 2017.

1.2 Project goals

        The purpose of this analysis is to provide explanations and improvement suggestions for the following issues through the analysis of Taobao user behavior data:

  1. Analyze the common e-commerce analysis indicators in the process of users using Taobao, establish a user behavior conversion funnel model, determine the loss rate of each link, and find out the link that needs improvement;
  2. Study the behavioral rules of users under different time scales, find out the active rules of users under different time periods, and propose corresponding marketing strategies accordingly;
  3. Analyze user preferences for different types of commodities, and propose marketing strategies for different commodities;
  4. Use the RFM model to stratify users, analyze the behavior of different types of users, and propose corresponding operating strategies.

1.3 Dataset source and introduction

        Data source: Taobao user shopping behavior dataset_Dataset-Alibaba Cloud Tianchi (aliyun.com)        

        UserBehavior is a Taobao user behavior dataset provided by Alibaba, which is used for the research of implicit feedback recommendation problems.

file name illustrate contains features
UserBehavior.csv Contains all user behavior data User ID, Product ID, Product Category ID, Behavior Type, Timestamp

UserBehavior.csv
        This data set contains all the behaviors of about one million random users (behaviors include clicks, purchases, additional purchases, and likes) between November 25, 2017 and December 3, 2017. The organization form of the data set is similar to that of MovieLens-20M, that is, each row of the data set represents a user behavior, which consists of user ID, product ID, product category ID, behavior type and timestamp, and is separated by commas. A detailed description of each column in the dataset is as follows:

column name illustrate
User ID Integer type, serialized user ID
Product ID Integer type, product ID after serialization
Commodity category ID Integer type, the ID of the category to which the serialized product belongs
behavior type String, enumeration type, including ('pv', 'buy', 'cart', 'fav')
timestamp Timestamp when the action occurred

Note that there are four types of user behavior, they are

behavior type illustrate
pv Product details page pv, equivalent to click
buy commodity purchase
cart add item to cart
fav Favorite Product

Some notes on the dataset size are as follows

dimension quantity
amount of users 987,994
Number of Products 4,162,024
amount of users 987,994
The number of product categories 9,439
All behaviors 100,150,807

2. Analysis framework

3. Data cleaning

3.1 Data import

Create a new database using MySQL

CREATE DATABASE IF not EXISTS 淘宝用户行为
CHARACTER SET 'utf8mb4';

Import external data sources

Because the amount of data is too large, choose to import the first 1,000,000 rows of data.
The source dataset does not contain the field name row. When importing, the field name row is set to 0, and the first data row is set to 1.

The import is complete, it takes 1 minute and 17 seconds

3.2 Renaming column names

ALTER TABLE userbehavior
	CHANGE f1 user_id VARCHAR (255),
	CHANGE f2 item_id VARCHAR (255),
	CHANGE f3 category VARCHAR (255),
	CHANGE f4 behavior VARCHAR (255),
	CHANGE f5 time_stamp VARCHAR (255);

You can also rename directly when importing data

3.3 Remove duplicate values

        Remove duplicate values ​​Here, you can combine user_id, item_id, and time_stamp to form a joint primary key and group the data set. If there are no duplicate values ​​in the data set, theoretically count(*) should not be greater than 1 after joint grouping with the three; if there are duplicate values ​​in the data set, count(*) should theoretically be combined with the three for joint grouping ) should be greater than 1. Therefore, the having function can be used to judge whether the count(*) is greater than 1 or not.

SELECT user_id, item_id, time_stamp
FROM userbehavior
GROUP BY user_id, item_id, time_stamp
HAVING COUNT(*) > 1;

The result shows no duplicate values.

3.4 View missing values

        To view missing values, you can count how many rows there are under each field. If the number of rows is equal, it means that there are no missing values.

SELECT count(user_id), count(item_id), count(category), count(behavior), count(time_stamp)
FROM userbehavior;

         The results show that the number of rows in each field is equal and there are no missing values.

3.5 Time format conversion

-- 新增date、hour时间列
ALTER TABLE userbehavior
	ADD time TIMESTAMP,
	ADD date VARCHAR(10),
	ADD hour VARCHAR(10);
-- 时间格式转换
UPDATE userbehavior
SET time = FROM_UNIXTIME(time_stamp, '%Y-%m-%d %H:%i:%s'),
	date = FROM_UNIXTIME(time_stamp, '%Y-%m-%d'),
	hour = FROM_UNIXTIME(time_stamp, '%H');

 3.6 Eliminate outliers

        Need to check if the dates are all within the time range that needs to be analyzed i.e. between 25th November 2017 and 3rd December 2017.

-- 检查日期是否都在2017年11月25日至2017年12月3日之间
SELECT MIN(date), MAX(date)
FROM userbehavior;

Remove outliers

-- 排除日期不在2017年11月25日至2017年12月3日之间的数据
DELETE FROM userbehavior
WHERE date < '2017-11-25' OR date > '2017-12-03';

 

A total of 470 outliers were filtered out

Check to see if it is clean

SELECT MIN(date), MAX(date)
FROM userbehavior;

 

4. Data analysis

4.1 Analysis of User Behavior Based on User Behavior Conversion Funnel Model

4.1.1 Analysis of common e-commerce indicators

4.1.1.1 UV、PV、UV/PV

-- UV、PV、UV/PV指标统计
SELECT COUNT(DISTINCT user_id) AS UV
	,SUM(IF(behavior = 'pv', 1, 0)) AS PV
	,SUM(IF(behavior = 'buy', 1, 0)) AS Buy
	,SUM(IF(behavior = 'cart', 1, 0)) AS Cart
	,SUM(IF(behavior = 'fav', 1, 0)) AS Fav
	,SUM(IF(behavior = 'pv', 1, 0)) / COUNT(DISTINCT user_id) AS 'PV/UV'
FROM userbehavior;

        Total number of visiting users (UV): 9739

        Total Page Views (PV): 895636

        The average number of page views per person (UV/PV) in the statistical interval: about 92

4.1.1.2 Repurchase rate

        Definition of repurchase rate: the proportion of repeat consumers (users who consume twice or more) in the total consumer users within a certain time window (not deduplicated by day).

        Create a user behavior data view grouped by user_id to facilitate subsequent queries

-- 创建以user_id分组的用户行为数据视图
CREATE VIEW 用户行为数据 AS
	SELECT user_id
		,COUNT(behavior) AS 用户行为总数
		,SUM(IF(behavior = 'pv', 1, 0)) AS '浏览数'
		,SUM(IF(behavior = 'fav', 1, 0)) AS '收藏数'
		,SUM(IF(behavior = 'cart', 1, 0)) AS '加购数'
		,SUM(IF(behavior = 'buy', 1, 0)) AS '购买数'
	FROM userbehavior
	GROUP BY user_id
	ORDER BY 用户行为总数 DESC;
	
SELECT * FROM 用户行为数据;

 Repurchase rate

-- 复购率
SELECT SUM(IF(购买数 > 1, 1, 0)) AS '复购总人数'
	,COUNT(user_id) AS '购买总人数'
	,ROUND(100 * SUM(IF(购买数 > 1, 1, 0)) / COUNT(user_id), 2) AS '复购率'
FROM 用户行为数据
WHERE 购买数 > 0;

  

        Judging from the results, the repurchase rate is as high as 66.21%, reflecting the high loyalty of Taobao users.

4.1.1.3 Bounce rate

        Bounce rate definition: The percentage of users who only visited a single page as a percentage of all visiting users, or the percentage of users who left the website from the home page as a percentage of all visiting users.

        The bounce rate can reflect the user's recognition of the content of the website, or whether the website is attractive to users. Whether the content of the website can help users and retain users can also be seen directly in the bounce rate, so the bounce rate is an important criterion for measuring the quality of website content.

-- 跳失率
SELECT COUNT(*) AS '仅访问一次页面的用户数'
FROM 用户行为数据
WHERE 用户行为总数 = 1;

         The results show that within the statistical interval, no user leaves Taobao after browsing the page only once, that is, the bounce rate is 0. It reflects that the content of the product or the product detail page is very attractive to users, making users stay on Taobao.

        From the perspective of comprehensive repurchase rate and bounce rate, Taobao has high user loyalty and high-quality content, which can attract users to continue to use it. Therefore, it is necessary to pay attention to user relationships and maintain user loyalty.

4.1.2 Analysis of User Behavior Conversion Funnel Model

        The funnel analysis model has been widely used in data analysis work in various industries to evaluate the overall conversion rate, the conversion rate of each link, to scientifically evaluate the effect of special promotional activities, etc., and to conduct in-depth user behavior analysis by combining with other data analysis models. In order to find out the reasons for user loss, increase the number of users, activity, and retention rate, and improve the scientificity of data analysis and decision-making.

        Commonly used funnel model: home page—product details page—add to shopping cart—submit order—pay for order

        This data set only contains product details page (pv), add to shopping cart (cart), and payment order (buy) data, so the funnel model is simplified to: product details page—add to shopping cart—pay for order.

Conversion funnel of total user behavior (PV)

-- 用户总行为漏斗
SELECT behavior, COUNT(*)
FROM userbehavior
GROUP BY behavior
ORDER BY behavior DESC;

Total User Behavior Conversion Funnel Chart

 Conversion Funnel for Unique Visitors (UV)

-- 独立访客转化漏斗
SELECT behavior, COUNT(DISTINCT user_id)
FROM userbehavior
GROUP BY behavior
ORDER BY behavior DESC;

 

 

        Comprehensive user behavior conversion funnel diagram and independent visitor conversion funnel diagram can be found:

  1. The conversion rate from browsing product details page PV to purchase intention is only 6.19%, while from browsing product details page UV to purchase intention has a conversion rate of 75.45%, and it can be known from calculations that the average user browsing product details generated by each purchase The number of pages is pv/buy=89 5636/20359≈44 times, which means that users will browse a large number of product detail pages for comparison and screening before purchasing products. Therefore, browsing the product details page is the key link for index improvement. We can start with the recommendation mechanism and try our best to make accurate recommendations based on users’ daily browsing behaviors, reducing the cost for users to find information.
  2. The number of users who paid orders accounted for 68.92% of the number of users who browsed the product details page, reflecting the high purchase conversion rate of Taobao users, and the products on Taobao can meet the purchase needs of most users.

        Suggestions to improve conversion rate according to the above links:

  1. Optimize the search matching degree and recommendation strategy of the platform, actively recommend products with higher relevance according to user preferences, optimize the accuracy and aggregation ability of product search, and optimize the ranking priority of search results.
  2. Highlight the key information that users care about on the product details page, simplify the presentation of information flow, and reduce the cost for users to find information.

4.2 Analyze user behavior from the time dimension

4.2.1 Daily User Behavior Analysis

-- 每天的用户行为分析
SELECT date
	,COUNT(DISTINCT user_id) AS '每日用户数'
	,SUM(IF(behavior = 'pv', 1, 0)) AS '浏览数'
	,SUM(IF(behavior = 'fav', 1, 0)) AS '收藏数'
	,SUM(IF(behavior = 'cart', 1, 0)) AS '加购数'
	,SUM(IF(behavior = 'buy', 1, 0)) AS '购买数'
FROM userbehavior
GROUP BY date;

 Daily user behavior data changes

        Within the statistical window from November 25, 2017 to December 3, 2017, November 25-26 and December 2-3 are weekends.

        

        From the changes in daily user behavior data, it can be seen that from November 25th to December 1st, the range of data fluctuations was very small, and from December 2nd to December 3rd, various data indicators rose significantly, higher than the previous 7 days. item data indicators. However, there was no significant increase in the various data indicators of the previous weekend (November 25-November 26), so the increase in data indicators from December 2-December 3 has little correlation with the weekend, and is determined by The daily user behavior data graph shows that the increase in the number of daily active users, the number of views, the number of favorites, and the number of additional purchases is more obvious than the number of purchases. Therefore, it is speculated that the increase in data indicators from December 2nd to December 3rd may be related to It is related to Taobao Double Twelve warm-up activities, and the warm-up will increase the amount of browsing, collection, and additional purchases that are pre-purchase actions.

4.2.2 Timely User Behavior Analysis

-- 每时用户行为分析
SELECT hour
	,COUNT(DISTINCT user_id) AS '每日用户数'
	,SUM(IF(behavior = 'pv', 1, 0)) AS '浏览数'
	,SUM(IF(behavior = 'fav', 1, 0)) AS '收藏数'
	,SUM(IF(behavior = 'cart', 1, 0)) AS '加购数'
	,SUM(IF(behavior = 'buy', 1, 0)) AS '购买数'
FROM userbehavior
GROUP BY hour;

Changes in user behavior data every time

        From the changes in user behavior data every hour, we can see that at around 2-5 o'clock, various data indicators entered a low period, and at 9-18 o'clock, the data showed a small peak, with small fluctuations (among them, at 12 o'clock and 16-17 o'clock There is a small drop in points), at 18-23 o'clock, each data indicator presents a big peak, and reaches the maximum peak of daily data at around 21:00, and the trend of data changes is in line with the normal work and rest rules of most users.

        When formulating operating strategies, you can use this rule to generate income, and choose to use marketing methods such as live streaming and promotional activities between 20-22 o'clock when users are most active.

4.3 Analyzing User Behavior from the Product Dimension

        Product popularity can be analyzed from two dimensions of sales and page views. Products with a high number of views may be attracted by pages or advertisements, or they are just interested, and users may not necessarily buy them; while products with high sales volume may be what users really need. clear. Therefore, it is necessary to combine the two dimensions of sales and page views for analysis.

4.3.1 Analysis of product rankings

4.3.1.1 Top 10 Product Sales List

Query the top ten selling products 

-- 售出商品总数
SELECT COUNT(DISTINCT item_id)
FROM userbehavior
WHERE behavior = 'buy';
-- 商品销量排行榜前10
SELECT item_id, COUNT(behavior) AS '购买次数'
FROM userbehavior
WHERE behavior = 'buy'
GROUP BY item_id
ORDER BY 购买次数 DESC
LIMIT 10;

 

        From the product sales list, it can be found that among the 17,565 products sold, the sales of a single product did not exceed 17 times at most, and only 5 products sold more than 10 times. It can be seen that there are no popular products in the analyzed data set, and the diversification of products depends on meeting customer needs, so we can focus more on increasing the diversity of products instead of creating popular products.

4.3.1.2 Top 10 Product Views List

Query the top ten products with the most page views

-- 商品浏览量排行榜前10
SELECT item_id, COUNT(behavior) AS '浏览次数'
FROM userbehavior
WHERE behavior = 'pv'
GROUP BY item_id
ORDER BY 浏览次数 DESC
LIMIT 10;

Connect the tables of the top ten sales and top ten page views, and conduct a preliminary analysis of the correlation between sales and page views

-- 商品销量榜单与浏览量榜单表连接(top10)
SELECT a.item_id, a.购买次数, b.浏览次数
FROM (
	SELECT item_id, COUNT(behavior) AS '购买次数'
	FROM userbehavior
	WHERE behavior = 'buy'
	GROUP BY item_id
	ORDER BY 购买次数 DESC
	LIMIT 10
) AS a
LEFT JOIN (
	SELECT item_id, COUNT(behavior) AS '浏览次数'
	FROM userbehavior
	WHERE behavior = 'pv'
	GROUP BY item_id
	ORDER BY 浏览次数 DESC
	LIMIT 10
) AS b
	ON a.item_id = b.item_id;

A table connecting the top 20 sales and top 20 views 


-- 商品销量榜单与浏览量榜单表连接(top20)
SELECT a.item_id, a.购买次数, b.浏览次数
FROM (
	SELECT item_id, COUNT(behavior) AS '购买次数'
	FROM userbehavior
	WHERE behavior = 'buy'
	GROUP BY item_id
	ORDER BY 购买次数 DESC
	LIMIT 20
) AS a
LEFT JOIN (
	SELECT item_id, COUNT(behavior) AS '浏览次数'
	FROM userbehavior
	WHERE behavior = 'pv'
	GROUP BY item_id
	ORDER BY 浏览次数 DESC
	LIMIT 20
) AS b
	ON a.item_id = b.item_id
WHERE 浏览次数 IS NOT NULL;

 

A table connecting the top 20 product sales, page views, favorites, and additional purchases

         The results show that none of the top 10 products ranked in the top 10 in terms of page views, and only 3 of the top 20 products ranked in the top 20 in terms of page views and additional purchases. It reflects that the correlation between sales and views (as well as favorites and additional purchases) is poor, and the number of views (as well as favorites and additional purchases) of high-selling products is not necessarily high, so it is necessary to combine the two dimensions of sales and views at the same time analyze. The following will divide the products into four quadrants based on the two dimensions of product sales and page views, analyze the user behavior corresponding to different products, and propose corresponding improvement measures.

4.3.2 Four-quadrant division of commodities

        The cut-off values ​​of the two dimensions of sales and page views are 4 and 40 respectively (the cut-off values ​​need to be determined according to the actual business scenario), and the products are divided into four quadrants according to the sales volume and the number of page views.

-- 查询所有商品的浏览量与销量
SELECT item_id
	,SUM(IF(behavior = 'pv', 1, 0)) AS '浏览次数'
	,SUM(IF(behavior = 'buy', 1, 0)) AS '购买次数'
FROM userbehavior u
GROUP BY item_id
ORDER BY 购买次数 DESC;

        Quadrant Ⅰ: The product views and sales are high, indicating that the conversion rate of the product is high, and it is a product that is popular with users.

        Optimization method: Focus on pushing the products in this quadrant to increase exposure, and at the same time, do more activities to attract more potential users.

        Quadrant II: The sales volume of the product is relatively high, but the number of page views is low. There are two possible reasons for this phenomenon:

        ①The products in this quadrant may belong to a specific group of just-needed products, and the search and browsing goals of specific audiences are relatively clear;

        ②The products in this quadrant have a wide audience and a high conversion rate, but the number of drainage entrances is small, resulting in low exposure.

        Optimization method: Collect the information of users who purchase and browse products in this quadrant, analyze user portraits, and combine product characteristics to verify whether there is a specific audience for the product.

        ①If it exists, the platform can carry out directional and precise pushes for this type of users, and can also establish an exclusive community for this type of users to provide a more convenient platform for user communication and further increase user stickiness;

        ②If it does not exist, you can do more promotion of the products in this quadrant, set high-frequency search keywords, so as to increase the exposure rate, increase the drainage entrance, increase the number of views, and the sales may increase accordingly.

        Quadrant Ⅲ: The viewing volume and sales volume of the product are low, and the reasons for the drainage entrance and the product itself need to be considered.

        Optimization:

        ①Consider whether the promotion of the product is too small, and the number of drainage entrances is small, you can try to increase the exposure of the product.

        ②If the sales volume of the product is still relatively sluggish after increasing the exposure, it means that the user is not interested in the product, so it is necessary to consider whether the product is what the user really needs, and the product with poor effect can be considered to be directly optimized.

        Quadrant Ⅳ: The viewing volume of the product is high, but the sales volume is low, indicating that the conversion rate of the product is low. The reasons can be analyzed from the following aspects:

        ①Target crowd: The promotion of the product itself is very attractive, but the directionality is not clear enough, resulting in many non-target users clicking on the product, but not buying it;

        ② Commodity pricing: Commodity pricing is too high, and there are similar and cost-effective products that can be replaced, and users will switch to other similar products;

        ③ Product detail page, customer service and evaluation: users cannot get the detailed product information they need from the detail page and customer service, or there are many negative reviews on the product, or some issues that users care about are mentioned in the evaluation, resulting in users not purchasing;

        ④Purchase process: The use of product coupons is complicated, or the purchase process is complicated, which makes users give up buying.

        Optimization method: According to the above possible reasons, use research, A/B testing and other methods to find out the reasons and prescribe the right medicine.

4.3.3 "Long Tail Effect" Analysis

        Long Tail Effect, English name Long Tail Effect. "Head" and "tail" are two statistical terms. The protruding part in the middle of the normal curve is called the "head"; the relatively flat parts on both sides are called the "tail". From the perspective of people's needs, most of the needs will be concentrated in the head, and this part we can call it popular, while the needs distributed in the tail are individualized, scattered and small quantities. And this part of the differentiated and small amount of demand will form a long "tail" on the demand curve , and the so-called long tail effect lies in its quantity. Adding up all non-popular markets will form a relatively popular market. The market is still big.

        The root of the long tail effect is to emphasize "personalization", "customer power" and "small profit and big market", that is, to make a small amount of money, but to make a lot of people's money. When the market is subdivided into very small ones, then it will be found that the accumulation of these small markets will bring about an obvious long-tail effect. 

Categorize products according to product sales 

-- 根据商品销量对商品进行分类统计
SELECT t.购买次数, COUNT(t.item_id) AS '商品量'
FROM (
	SELECT item_id, COUNT(item_id) AS '购买次数'
	FROM userbehavior
	WHERE behavior = 'buy'
	GROUP BY item_id
	ORDER BY 购买次数 DESC
) AS t
GROUP BY t.购买次数
ORDER BY 商品量 DESC;

         According to the product volume data corresponding to different sales volumes, among the 17,565 items sold, 15,536 items were purchased only once, accounting for 88.45% of the total number of items placed, indicating that Taobao’s product sales mainly rely on the accumulation of long-tail items The effect is not driven by explosive products.

4.4 Analysis of User Behavior Based on RFM User Hierarchy Model

        Since the data set does not include the order amount, the M dimension is not considered in this analysis, only the R and F dimensions are analyzed, and the indicators of the two dimensions are graded and scored, and finally the users are stratified according to the comprehensive score.

4.4.1 R dimension analysis

        Calculate the R value of the user's latest consumption time interval (the smaller the R value, the closer the user's last consumption time), and score the R value. According to the R value results, it is divided into three intervals [0:2], [3:5], [6:8], and the R_score values ​​are 3, 2, and 1 respectively.

-- RFM模型——R维度分析
CREATE VIEW r_value AS
	SELECT user_id, DATEDIFF('2017-12-03', MAX(date)) AS R
	FROM userbehavior
	WHERE behavior = 'buy'
	GROUP BY user_id;
-- 进行R维度打分
CREATE VIEW r_score AS
	SELECT user_id, R
		,CASE 
			WHEN R BETWEEN 0 AND 2 THEN 3
			WHEN R BETWEEN 3 AND 5 THEN 2
			ELSE 1
		END AS R_score
	FROM r_value;
-- 统计R_score数量
SELECT R_score, COUNT(R_score)
FROM r_score
GROUP BY R_score
ORDER BY R_score DESC;

Statistics of the proportion of different R_score

        From the ratio of R_score, it can be found that more than half of the users made their last purchase within the past 3 days, indicating that Taobao has good user stickiness.

4.4.2 F Dimension Analysis

        Calculate the user's consumption frequency F value (the smaller the F value, the more the user spends in a period of time), and score the F value. According to the F value result (the maximum value is 72), it is divided into 6 intervals [1:9], [10:19], [20:29], [30:39], [40:49], [50 :72], giving R_score values ​​of 1, 2, 3, 4, 5, and 6 points respectively. 

-- RFM模型——F维度分析
CREATE VIEW f_value AS
	SELECT user_id, COUNT(behavior) AS F 
	FROM userbehavior
	WHERE behavior = 'buy'
	GROUP BY user_id;
-- 进行F维度打分
CREATE VIEW f_score AS
	SELECT user_id, F
		,CASE 
			WHEN F BETWEEN 1 AND 9 THEN 1
			WHEN F BETWEEN 10 AND 19 THEN 2
			WHEN F BETWEEN 20 AND 29 THEN 3
			WHEN F BETWEEN 30 AND 39 THEN 4
			WHEN F BETWEEN 40 AND 49 THEN 5
			ELSE 6
		END AS F_score
	FROM f_value;
-- 统计F_score数量
SELECT F_score, COUNT(F_score)
FROM f_score
GROUP BY F_score
ORDER BY F_score DESC;

Statistics of different F_score proportions 

        From the proportion of F_score, it can be found that within the statistical interval, 96.76% of users have spent 1-9 times on Taobao, and only 3.24% of users have spent 10 times or more on Taobao.

4.4.3 User Hierarchy

        R and F are scored comprehensively, and users are stratified according to the scores: users are divided into four grades: 2-3 points, 4-5 points, 6-7 points, and 8-9 points, corresponding to easy-to-churn users, Retain users, develop users, and loyal users.

-- RF综合打分
CREATE VIEW rf_score AS
	SELECT r.user_id, R_score, F_score
		,R_score + F_score AS RF_score
	FROM r_score r join f_score f 
		ON r.user_id = f.user_id;
-- 用户分层并统计不同类型用户数量
SELECT 用户分层, COUNT(*) AS user_cnt
FROM (
	SELECT *
		,CASE WHEN RF_score BETWEEN 2 AND 3 THEN '易流失用户'
			WHEN RF_score BETWEEN 4 AND 5 THEN '挽留用户'
			WHEN RF_score BETWEEN 6 AND 7 THEN '发展用户'
			ELSE '忠实用户' END AS '用户分层'
	FROM rf_score
) AS t
GROUP BY 用户分层;

Statistical proportion of different types of users 

User stratification results:

  • The proportion of retaining users is the highest, and the potential value of this part of users to be tapped is very large. Regular activation of these users should be carried out, such as new reminders, discount distribution, and more accurate product recommendations, etc., to retain users and increase their consumption. frequency;
  • Users who are prone to churn account for a relatively high proportion. These users may have found alternatives on other platforms or have poor product experience. You can conduct research on these users to find out the reasons for churn, and use price incentives, preferential distribution, etc. to recall users in a timely manner ;
  • The proportion of developing users is relatively low, and new products and promotional activities can be regularly pushed to these users to further increase the frequency of consumption;
  • Loyal users account for the lowest proportion, and these users are high-value users, who need to formulate exclusive operation strategies to maintain user stickiness, such as exclusive discounts, exclusive customer service, etc.

 User stratification effect analysis:

        Judging from the proportion of different users, the effect of user stratification this time is not good, which may be due to the following two reasons:

  1. The division of the two dimensions is unreasonable, and the users are not well distinguished. You should look at the distribution of users in each dimension in advance when doing the division, and divide the division according to the needs of actual business scenarios;
  2. The number of intervals divided by the two dimensions is inconsistent, and the assigned score intervals are quite different, which is equivalent to assigning different weights to the two dimensions.

        It may be better to use a four-quadrant scatter plot with two dimensions of RF.

V. Conclusions and Suggestions

        This article analyzes nearly 1 million pieces of Taobao user behavior data from four dimensions. The overall conclusions and suggestions are as follows:

5.1 User Behavior Conversion Funnel Analysis

  1. Judging from the repurchase rate and bounce rate, the products on Taobao platform are attractive enough to users (high repurchase rate and low bounce rate), which shows that Taobao platform is currently in the "loyalty mode" and the focus is on maintaining old customers. User loyalty.
  2. According to the analysis of user behavior conversion, the products on Taobao platform can meet the needs of most users (the purchase conversion rate is high). There is a conversion rate of 75.45% from browsing product details page UV to purchase intention, but only 6.19% conversion rate from browsing product detail page PV to purchase intention, indicating that users will browse a large number of product detail pages for comparison and screening before purchasing products . Browsing the product details page is the key link for indicator improvement. We can start with the recommendation mechanism and try to make accurate recommendations based on users’ daily browsing behaviors to reduce the cost of users looking for information.

        Suggestions for improving the conversion rate in the above links:

  1. Optimize the search matching degree and recommendation strategy of the platform, actively recommend products with higher relevance according to user preferences, optimize the accuracy and aggregation ability of product search, and optimize the ranking priority of search results.
  2. Highlight the key information that users care about on the product details page, simplify the presentation of information flow, and reduce the cost for users to find information.

5.2 Time Dimension User Behavior Analysis

  1. From the perspective of the date dimension, there is little difference between the various behavioral data indicators of users on weekends and weekdays, but they are greatly affected by large-scale platform activities such as Double 12. The scope of analysis can be further expanded, such as performing a year-on-year comparison analysis, marking each relatively large shopping festival, focusing on the changes in user behavior before and after the shopping festival, and comparing each weekend to analyze the promotional activities of the shopping festival The impact on user behavior on weekends/non-weekends; conduct a year-on-year analysis of each month in a year, compare the trend of purchase behavior, and find out whether there is a rising pattern of purchase behavior throughout the month (combined with user age data for analysis, purchase behavior The rise may be related to the time period of salary payment).
  2. From the perspective of time dimension, the active peak period of various user behaviors is between 20-22 o'clock in the evening. When formulating operation strategies, you can use this rule to generate income and choose to use it between 20-22 o'clock when users are most active. Marketing methods such as live streaming, promotional activities, etc.

5.3 Analyzing User Behavior in Commodity Dimensions

        The correlation between product sales and product pageviews is poor. Products with high pageviews are not necessarily high in sales, and products with high sales are not necessarily high in pageviews. Therefore, it is not necessary to blindly increase pageviews, and sales will not increase accordingly. According to the analysis of the four-quadrant division diagram, we should focus on improving the commodities in the second, third, and fourth quadrants:

  1. For products in the second quadrant (high sales volume, low page views), user portraits should be analyzed. If there is a specific audience, the platform can provide targeted and precise pushes for this type of users, and can also establish an exclusive community for this type of user. Communication provides a more convenient platform to further increase user stickiness; if there is no specific audience, you can promote more products in the quadrant, set high-frequency search keywords, increase exposure, and increase drainage entrances;
  2. For products in the third quadrant (low sales, low pageviews), you should try to increase the exposure of the products, and analyze whether the sales will increase accordingly. If the sales of the products are still relatively sluggish after increasing the exposure, it means that users are not interested in the products. It is necessary to consider whether the product is what the user really needs, and the product with poor effect can be considered to be directly optimized;
  3. For products in the fourth quadrant (low sales volume, high page views), we should start from the target group, product pricing, product details page, customer service and evaluation, and purchase process, and use research, A/B testing and other methods according to possible reasons Find out the cause and prescribe the right medicine.

5.4 "Eighth-to-Twenty Law" or "Long Tail Effect"

        Through the analysis, it is found that the sales volume of products on the Taobao platform is mainly driven by the "long tail effect" rather than hot-selling products. However, the wide variety of products is actually an operating burden for merchants, and the cost is also high. According to the "28th Law", merchants can actually make profits by creating explosive products. For popular products, it is recommended to improve product quality in terms of quality control, increase efforts in publicity (draining traffic on other platforms), and highlight product advantages in display (main image, detail page, reviews), etc.

5.5 RMF model analysis

        Use the RFM model to stratify users, and adopt different operating strategies for different types of users:

  1. Retaining users: These users have great potential value to be tapped, and regular promotion should be carried out for these users, such as new reminders, preferential distribution, and more accurate product recommendations, etc., to retain users and increase their consumption frequency;
  2. Users who are prone to churn: These users may have found substitutes on other platforms or have poor product experience. You can conduct research on these users to find out the reasons for churn, and use price incentives, preferential distribution, etc. to recall users in a timely manner;
  3. Develop users: New products and promotional activities can be regularly pushed to these users to further increase consumption frequency;
  4. Loyal users: These users are high-value users and need to formulate exclusive operation strategies to maintain user stickiness, such as exclusive discounts and exclusive customer service.

Guess you like

Origin blog.csdn.net/KOGAMIKEI/article/details/129394608