E-commerce user behavior data analysis (MySQL+Tableau)

1. Project introduction

1.1 Project Background
UserBehavior is a Taobao user behavior data set provided by Alibaba, which is used for research on implicit feedback recommendation problems. This data set package (UserBehavior.csv) contains all the behaviors of about one million random users with behaviors between November 25, 2017 and December 3, 2017 (behaviors include clicks, purchases, additional purchases, and likes) ).

1.2 Analysis purpose
The purpose of this analysis is to provide explanations and improvement suggestions for the following issues through data analysis of Taobao user behavior: ① Analyze the
common e-commerce analysis indicators that users use in Taobao, and establish a user conversion funnel model to determine The loss rate of each link, looking for links that need to be optimized.
②Analyze the behavior of users in different time dimensions, find out the rules of user activities in different time periods, and launch corresponding activity strategies.
③Find the user's product preferences, and find the corresponding marketing strategies for different products.
④ Use the RFM model to stratify users, analyze different types of user behaviors, and propose corresponding operating strategies.

1.3 Data source
Data source: Taobao user shopping behavior dataset_dataset-Alibaba Cloud Tianchi
UserBehavior is a Taobao user behavior dataset provided by Alibaba, which is used for research on implicit feedback recommendation problems.
insert image description hereUserBehavior.csv
This data set contains all the behaviors of about one million random users (behaviors include clicks, purchases, additional purchases, and likes) between November 25, 2017 and December 3, 2017. The organization form of the data set is similar to that of MovieLens-20M, that is, each row of the data set represents a user behavior, which consists of user ID, product ID, product category ID, behavior type and timestamp, and is separated by commas. The detailed description of each column in the data set is as follows:
insert image description hereNote that there are four types of user behaviors, and they are
insert image description heresome descriptions about the size of the data set as follows
insert image description here2. Analysis box
insert image description here3. Data cleaning

3.1 Import data and modify table fields
Source data table fields are unclear, modify their names, and modify the corresponding data types at the same time: user_id, item_id, category_id, behavior_type, time_stamp
insert image description here

3.2 Remove duplicate values
​​Check each field 'not null', and select userID, itemID, timestamps as the primary key.
This step ensures that there are no empty or duplicate values ​​in the table.
insert image description hereinsert image description here
The result shows no duplicate values.

3.3 Finding missing values
insert image description here
​​Comparing the query results, there are no missing values, and the data quality of the dataset is high.

3.4 Convert time format
insert image description hereinsert image description here
3.5 Filter outliers
insert image description here
Eliminate
insert image description here511 outliers in total
and then check again to see if the data has been eliminated.
insert image description here
4. Data analysis
4.1 Analyze user behavior based on user behavior funnel model
4.1.1 Understand the overall situation of user behavior
4.1. 1.1 Statistics of common data indicators
insert image description here
Total number of unique visitors (UV): 10202
Total page visits (PV): 939535
Statistical interval Average page visits per person (PV/UV): about 92

4.1.1.2 Retention rate
① Number of retained users on the next day, 3rd day, 5th day, and 7th day
First, query the number of active users on the first day, and create a new table to store the retained data to
insert image description here
query the retained users on the next day, day 3, and day 5 , The query method of the number of retained people in the seven days is the same, just modify the date
insert image description here
insert image description here
② Retention rate
insert image description hereThe retention rate has remained above 75%, and even reached 77.42% after 7 days, which is a relatively high retention rate.

4.1.1.3 Repurchase rate
The repurchase rate is the number of users who make repeated purchases within a certain period of time, that is, the proportion of users who have consumed twice or more in the total number of consumers.
insert image description hereThe repurchase rate is about 66.27%, and the repurchase rate is high, indicating that the loyalty of platform users is high

4.1.1.4 Bounce rate
Bounce rate definition: the percentage of users who only visited a single page accounted for the total number of visiting users, or the percentage of users leaving the website from the homepage accounted for all visiting users.
The bounce rate can reflect the user's recognition of the content of the website, or whether the website is attractive to users. Whether the content of the website can help users and retain users can also be seen directly in the bounce rate, so the bounce rate is an important criterion for measuring the quality of website content.
insert image description hereThe bounce rate is only 0.09%, which is almost 0, indicating that the content of the product or business details page is very attractive to users.
Summary:
From the perspective of comprehensive retention rate, repurchase rate and bounce rate, Taobao has high user loyalty and high-quality content, which can attract users to continue to use it. Therefore, it is necessary to pay attention to user relationships and continue to maintain user loyalty.

4.1.2 User Behavior Transformation Funnel Model Analysis
The funnel analysis model has been widely used in data analysis work in various industries to evaluate the overall conversion rate and the conversion rate of each link, to scientifically evaluate the effect of special promotional activities, etc. The analysis model is combined with in-depth user behavior analysis to find out the reasons for user loss, so as to increase the number of users, activity, retention rate, and improve the scientific nature of data analysis and decision-making.
(1) User Total Behavior Conversion Funnel
B
insert image description hereFrom <User Total Behavior Conversion Funnel>, it can be seen that:
①The behavior with the highest proportion after browsing the details page is to add to the shopping cart. It is guessed that when users compare products, they are more inclined to add to the shopping cart.
② The proportion of collecting products is relatively small, and smaller than that of adding to the shopping cart. The reason may be that after collecting, users cannot directly make settlements, and need to go through a series of processes such as click-add-purchase-payment again.
③ The user behavior with the smallest proportion is to purchase products, accounting for only 2.27% of the behavior of browsing the details page. It can be seen that the loss of users after clicking and browsing the products is relatively large. Now how to reduce the loss between the click behavior and the purchase behavior is as the following analysis direction.

(2) Unique visitor conversion funnel
insert image description here

insert image description hereFrom <Independent Visitor Conversion Funnel>, it can be seen that
the users with favorite behavior are the least, lower than the users with additional purchase behavior. After browsing the details of the page, the purchase conversion rate is high, reaching 68.47%. The following dismantling and analysis of which shopping method promotes the user's purchase conversion.

(3) Disassemble different behavior paths
Disassemble the purchase behavior into four types: click-purchase, click-add purchase-purchase, click-collection-purchase, click-collection-addition purchase-purchase
First create a temporary view to store the behavior data of each user.
insert image description here① Browsing——Purchase Path
Browsing: 34226
Purchases: 1735
Purchase Conversion Rate: 5.01%

② Browsing—Additional Purchase—Purchase Path
Browse: 316811Additional
purchase: 26581Purchase
: 9870Purchase
conversion rate: 3.12%

③ Browse—Favorite—Purchase Path
Browse: 37895Favorite
: 3297Purchase
: 1268Purchase
conversion rate: 3.35%

④ Browse—Favorite—Add Purchase—Purchase Path
Browse: 144245
Favorite: 9843
Add Purchase: 9164
Purchase: 4306
Purchase Conversion Rate: 2.99%

Summary:
Browsing—purchasing path has the highest purchase conversion rate. You can increase sales by increasing the proportion of this part of users, such as adding promotional activity reminders on the browsing page, to promote users who add purchases to collect operations.

4.2 Analysis of user behavior from the time dimension
4.2.1 Daily user behavior analysis
insert image description hereDaily user behavior data changes
insert image description here

Within the statistical window from November 25, 2017 to December 3, 2017, November 25-26 and December 2-3 are weekends.

From the changes in daily user behavior data, it can be seen that from November 25th to December 1st, the range of data fluctuations was very small, and from December 2nd to December 3rd, various data indicators rose significantly, higher than the previous 7 days. item data indicators. However, there was no significant increase in the various data indicators of the previous weekend (November 25-November 26), so the increase in data indicators from December 2-December 3 has little correlation with the weekend, and is determined by The daily user behavior data graph shows that the increase in the number of daily active users, the number of views, the number of favorites, and the number of additional purchases is more obvious than the number of purchases. Therefore, it is speculated that the increase in data indicators from December 2nd to December 3rd may be related to It is related to Taobao Double Twelve warm-up activities, and the warm-up will increase the amount of browsing, collection, and additional purchases that are pre-purchase actions.

4.2.2 Hourly User Behavior Analysis
insert image description hereHourly User Behavior Data Changes
insert image description hereFrom the hourly user behavior data changes, it can be seen that at around 2-5 o'clock, various data indicators enter a low period, and at 9-18 o'clock, the data presents a small peak , the fluctuation changes are small (among them, there is a small decline at 12 o'clock and 16-17 o'clock), and at 18-23 o'clock, each data index presents a big peak, and reaches the maximum peak of daily data around 21 o'clock, the data The change trend is in line with the normal routine of most users.
When formulating operating strategies, you can use this rule to generate income, and choose to use marketing methods such as live streaming and promotional activities between 20-22 o'clock when users are most active.

4.3 Analyzing User Behavior from Product Dimensions
The popularity of products can be analyzed from the two dimensions of sales and pageviews. Products with a high number of views may be attracted by pages or advertisements, or they are just interested, and users may not necessarily buy them; while products with high sales volume may be what users really need. clear. Therefore, it is necessary to combine the two dimensions of sales and page views for analysis.
4.3.1 Analysis of product rankings
4.3.1.1 Combined product sales, browsing, additional purchases, and favorites ranking TOP20
insert image description here——Top 20 product sales
insert image description here——Top 20 views
insert image description here——Top 20 favorites
insert image description here
——Top 20 additional purchases
insert image description here
——Merge 4 The results of the table
insert image description here
show that only 3 of the top 20 products ranked in the top 20 in terms of views and additional purchases (and did not rank in the top 10). It reflects that the correlation between sales and views (as well as favorites and additional purchases) is poor, and the number of views (as well as favorites and additional purchases) of products with high sales is not necessarily high, so it is necessary to combine sales and views at the same time dimensions are analyzed. The following will divide the products into four quadrants based on the two dimensions of product sales and page views, analyze the user behavior corresponding to different products, and propose corresponding improvement measures.

4.3.2 Commodity four-quadrant division
The cut-off values ​​of the two dimensions of sales and page views are 4 and 40 respectively (the cut-off values ​​need to be determined according to the actual business scenario), and the products are divided into four quadrants according to sales volume and page views.
insert image description hereinsert image description hereThe first quadrant: The sales volume and page views of the products in this quadrant are relatively high, which means that the conversion rate of the products in this quadrant is relatively high, and they are popular products.
Optimization suggestions: Focus on pushing the products in this quadrant to increase the exposure of the products. At the same time, you can do more activities to attract more potential users to buy.

Quadrant 2: Products in this quadrant have higher sales but lower pageviews. The reasons for this phenomenon may be:
①The products in this quadrant may belong to a specific group of just-needed products, the search target of specific audiences is relatively clear, and the possibility of direct purchase after browsing is relatively high; ②The
products in this quadrant have a wide audience , the conversion rate itself is high, but the number of drainage entrances is small, resulting in low product exposure.
Optimization suggestions: Collect user information who have browsed and purchased products in this quadrant, analyze user portraits, and combine product specificity to verify whether the product has a specific audience.
①If it exists, the platform can make accurate pushes for this type of users, and at the same time, it can also establish an exclusive community for users of this type of product to provide a more convenient platform for user communication and further increase user stickiness; ②If it does not exist, it can
provide The products in this quadrant are added with drainage entrances, multi-channel promotion and drainage, and high-frequency search keywords are set to increase the exposure rate. If the number of views increases, sales may increase accordingly.

The third quadrant: The browsing volume and sales volume of products in this quadrant are relatively low, and it is necessary to consider the reasons of the drainage entrance and the product itself.
Optimization method:
① Assuming that the promotional strength of the product is too low, resulting in low viewing volume and sales of the product, consider increasing the number of drainage entrances to increase the exposure of the product in this quadrant, and then monitor whether the viewing volume and sales of the product have increased .
②Assuming that it is the reason of the product itself, users are not interested in this type of product, even if the number of views and exposure of this type of product cannot be increased by increasing the drainage entrance, then it is necessary to consider whether the product is really what the user needs, and the traffic of the product in the past What is the trend, whether there are factors such as seasonality (need to promote such products at a specific time), if all factors are excluded, then it is necessary to consider directly optimizing such products to avoid waste of resources.

Quadrant 4: Products in this quadrant have a high number of views, but sales are low. Let’s analyze the reasons:
①Target crowd: The promotion of the product itself is very effective, but the direction is not clear, causing many non-target audiences to click ②Commodity pricing: If
the price of the commodity is too high, if there are similar alternatives with high cost-effectiveness, users will switch to other similar commodities;
③Business details page, customer service and evaluation: users can The store details page and customer service office get more product details, or there are many negative product reviews, or other customer reviews mention issues that users care about, which leads to users not purchasing directly; ④Purchase process: how to use product
coupons Responsibility, or the complex order purchase process makes users abandon purchases;
⑤ Shipping costs, logistics timeliness, return policy: high shipping costs and no preferential activities, long logistics timeliness, no after-sales guarantee, and users have low trust in the website and therefore give up buying.
Optimization method: According to the above possible reasons, use research, A/B testing and other methods to find out the reasons and prescribe the right medicine.

4.3.3 Analysis of "Long Tail Effect"
According to the sales volume of products,
insert image description hereinsert image description herea total of 18,338 products were sold, and 16,188 products were purchased with 1 purchase, accounting for 88.28% of the total, indicating that the platform mainly relies on long tail products The cumulative effect drives the sales of the platform, not the sales of popular products.

4.4 Analysis of User Behavior Based on the RFM User Hierarchical Model
Since the data set does not contain the order amount, the M dimension is not considered in this analysis, only the R and F dimensions are analyzed, and the indicators of the two dimensions are graded and scored, and finally Stratify users by composite score.
4.4.1 R Dimension Analysis
The time span of the data source is from November 25 to December 3, 2017, a total of 9 days. Now it is divided into three intervals:
the time interval is in [0:2], record the R_Score value of 3 minutes; the
time interval is in [3:5], record the R_Score value of 2 minutes;
the time price is in [6:8], record R_Score is worth 1 point

——First calculate how many days are the last consumption of each user from 2017-12-03, then assign values, score, and finally count the number of users with different scores
insert image description hereFrom the results, it can be seen that more than half of the users’ last purchase time The interval from 2017-12-03 is within 3 days, indicating that the user stickiness of the platform is relatively good.

4.4.2 F Dimension Analysis

Calculate the user's consumption frequency F value and score the F value. According to the maximum value of F value 72 and the minimum value of 1, the range is divided into 6 intervals: the
consumption frequency is [1:9], and the F_Score value is 1 point;
the consumption frequency is [10:19], and the F_Score value is 2 points
. [20:29], record the F_Score value of 3 points, the
consumption frequency is at [30:39], record the F_Score value of 4 points, the
consumption frequency is at [40:49], record the F_Score value of 5 points,
the consumption frequency is at [50:72], record the F_Score worth 6 points

insert image description hereinsert image description here
Fraction 1 proportion: 0.06%
Fraction 2 proportion: 0.01%
Fraction 3 proportion: 0.07%
Fraction 4 proportion: 0.27% Fraction
5 proportion: 2.87%
Fraction 6 proportion: 96.71%

It can be seen that within the statistical interval, 96.71% of users have spent 1-9 times on the platform, and only 3.24% of users have spent 10 times or more on the platform.

According to the RF comprehensive score (the maximum value is 9, the minimum value is 2), users are stratified:
RF score is 2-3 points, RF score is 4-5 points for users who are prone to loss , and RF score is 6-7 points
for retaining users
, for the development of user
RF is divided into 8-9 points, for loyal users
insert image description hereinsert image description here
User stratification results:
the highest proportion of retained users, these users have a great potential value to be tapped, and these users can be regularly promoted, as above New reminders, preferential distribution, more accurate product recommendations, etc., to retain users and increase their consumption frequency; the
proportion of users who are prone to loss is relatively high, and these users may have found more favorable alternatives or better product experience on other platforms For products, you can conduct research on these users to find out the reasons for churn, and use price incentives, preferential distribution, etc. to recall users; the
proportion of developing users is relatively low, and you can regularly push new products and promotional activities to these users to further increase consumption frequency ;
The proportion of loyal users is the lowest, and these users are high-value users. You can create a membership system for these users and customize exclusive operating strategies to maintain user stickiness, such as exclusive discounts, exclusive holiday gifts, free gifts, new product internal testing, Points can be redeemed for purchases, exclusive customer service, and more.

V. Conclusions and Suggestions
This article analyzes nearly 1 million pieces of Taobao user behavior data from four dimensions. The overall conclusions and suggestions are as follows:
5.1 User behavior conversion funnel analysis
①The platform has a high retention rate and repurchase rate, and the bounce rate is almost 0, indicating that the products on Taobao platform are sufficiently attractive to users, and most of them are used to daily shopping on Taobao platform (depending on the platform), so the platform should focus on maintaining the loyalty of old users.
According to the retention rate, repurchase rate and bounce rate, the products on Taobao platform are attractive enough to users (high repurchase rate and low bounce rate), which shows that Taobao platform is currently in the "loyalty mode". In order to maintain the loyalty of old users.
②According to the conversion analysis of user behavior, there is a conversion rate of 68.84% from browsing the product detail page UV to purchasing intention, indicating that the products on the Taobao platform can meet the needs of most users (the purchase conversion rate is high), while browsing products The conversion rate from detail page PV to purchase intention is only 2.27%, indicating that users will browse a large number of product detail pages for comparison and screening before purchasing products. Browsing the product details page is the key link for indicator improvement. We can start with the recommendation mechanism and try to make accurate recommendations based on users’ daily browsing behaviors to reduce the time cost for users to find information.

Suggestions for improving the conversion rate in the above links:
①According to the user's search habits and preferences, optimize the search matching degree and recommendation strategy of the platform, so as to recommend products with higher matching degree for users, and push them as accurately as possible.
② On the display of the business details page, highlight the key information that users pay attention to, simplify the presentation of information flow, and reduce the time cost for users to find information.

5.2 Analysis of User Behavior in the Time Dimension
①From the perspective of date, there is little difference between the various behavior data indicators of users on weekends and weekdays, but they are greatly affected by large-scale platform activities such as Double 12. The scope of analysis can be further expanded, for example:
conduct a year-on-year comparison analysis, mark each relatively large shopping festival, focus on the changes in user behavior before and after the shopping festival, and compare every weekend to analyze the promotion activities of the shopping festival The impact of scheduling on weekends/non-weekends on user behavior;
conduct a year-on-year analysis of each month in a year, compare the trend of purchasing behavior, and find out whether there is a rising pattern of purchasing behavior throughout the month (combined with user age data for analysis, purchase Behavioral upticks may be related to pay periods).
②From the perspective of time, the peak period of active user behaviors is between 20-22 o'clock in the evening. According to this rule, more efficient operation strategies can be formulated, and users can choose to use it between 20-22 o'clock when users are most active. Marketing methods such as live streaming, promotional activities, etc.

5.3 Product Dimension Analysis User Behavior
The correlation between product sales and product views is poor. Products with high sales volume may not necessarily have high sales volume, and products with high sales volume may not necessarily have high page views. Therefore, there is no need to blindly increase page views, and sales will not increase accordingly. According to the analysis of the four-quadrant division diagram, we should focus on improving products in the second, third, and fourth quadrants:
① For products in the second quadrant (high sales volume, low page views), user portraits should be analyzed. If there are, the platform can target this category Users can make accurate pushes, and at the same time, they can also establish exclusive communities for users of this type of products, provide a more convenient platform for users to communicate, and further increase user stickiness; Channels for promotion and drainage, setting high-frequency search keywords to increase exposure, and the number of views will increase, and sales may increase accordingly.
② For products in the third quadrant (low sales volume, low page views), you should first analyze the past traffic trends of such products (whether they are seasonal or not, and you need to promote such products at a specific time), layout and increase the number of drainage entrances according to traffic trends Quantity, to see whether the sales will increase accordingly. If the sales of the product are still relatively sluggish after increasing the exposure, it means that the user is not interested in the product, and it is necessary to consider whether the product is what the user really needs. If the product is not effective, it can be considered to be directly optimized. ;
③For products in the fourth quadrant (low sales volume, high page views), we should start from the target group, product pricing, product details page, customer service and evaluation, purchase process, logistics, and after-sales guarantee, and use research according to possible reasons , A/B testing and other methods to find out the cause and prescribe the right medicine.

5.4 Analyzing the products according to the "long tail effect"
Through the analysis, it is found that the sales of products on Taobao platform mainly rely on the "long tail effect" rather than the promotion of popular products, but the variety of types is actually an operating burden for merchants , the cost is also higher. In fact, a platform can create commodity profits according to different goals (new arrivals, sales/explosions, profits).
①Products used to attract new products generally have uniqueness and can attract people. It is recommended to focus on increasing publicity to highlight product uniqueness, and at the same time pay attention to product quality (return rate) and do a good job in quality control.
②For popular products, it is recommended to make a price advantage (vs the price advantage of competing products), and at the same time improve product quality in quality control, increase efforts in publicity (drainage on other platforms), and highlight product advantages in display (main image, detail pages, reviews), etc.
③ For profit-making products, a more refined operation strategy is required, focusing on analyzing the preferences of the audience, so as to achieve accurate push, product pictures and detail page design are beautiful and highlight product advantages (market differentiation, user needs), improve product quality and Service quality, improve product praise rate, etc.

5.5 RMF model analysis
Use the RFM model to stratify users, and adopt different operating strategies for different types of users:
①Retain users: This type of users accounts for the highest proportion, and the potential value of this part of users is great to be tapped. Regularly promote the activation of these users, such as new reminders, discount distribution, and more accurate product recommendations, etc., to retain users and increase their consumption frequency
; If you have found a more favorable alternative product or a better product experience on other platforms, you can conduct research on these users to find out the reasons for easy loss, and use price incentives, preferential distribution, etc. to recall users; ③Develop users: the number of users of this
type The proportion is low, and new products and promotional activities can be regularly pushed to this part of users to further increase the frequency of consumption;
④Loyal users: this type of users has the lowest proportion, and these users are high-value users. A membership system can be created for this part of users. Customize exclusive operating strategies to maintain user stickiness, such as exclusive discounts, exclusive holiday gifts, free gifts, internal testing of new products, redemption of points for purchasing products, exclusive customer service, etc.

Guess you like

Origin blog.csdn.net/YL0621/article/details/129826395