Project: Taobao user data analysis

1. Project background

1. Project description:

The data set of this project contains a total of 100 million user data records between November 25, 2017 and December 3, 2017. The first 5 million pieces of data were extracted for analysis. The fields are composed of ID, product ID, and product category ID. , behavior type and timestamp, separated by commas. The main analysis tool is python.

2. Data set source and introduction:

From Alibaba Cloud Tianchi official data set: User Behavior Data from Taobao for Recommendation

Field name description:

Column name illustrate
User ID Integer type, serialized user ID
Product ID Integer type, serialized product ID
Product category ID Integer type, serialized product category ID
behavior type String, enumeration type, including ('pv', 'buy', 'cart', 'fav')
Timestamp The timestamp when the behavior occurred

Behavior type description:

behavior type illustrate
pv Product details page pv, equivalent to click
buy Product purchase
cart Add items to shopping cart
fav Favorite Product

2. Clarify the problem and analysis purpose, and establish an analysis framework

1. The analysis framework is as follows:

Insert image description here

2. Questions and analysis purposes:

2.1 Problem definition and purpose:
① Calculate the conversion rate of each link, analyze the churn rate and improve the link
② Analyze product sales, find patterns in user preferences and time dimensions, and adopt corresponding strategies on user-preferred product types and time. Such as event promotions, push, etc.
③ Find the core user group (due to the lack of sales amount field, analyze from sales frequency and latest consumption), and adopt differentiated strategies for core users

2.2 Problem analysis process:
① Disassemble by product and user behavior
② View basic indicator data, such as pv, uv, bounce rate (users who only browse once), visitor payment conversion rate (number of people who purchased the product/uv), etc. ③
By product Dismantling analysis of major categories and product subcategories, mainly analyzing the relationship between sales volume and product categories
④ Analyzing user behavior, based on the time dimension and Taobao behavior dimension
⑤ Using simplified versions of the RFM model and AARRR model to disassemble the data Score

3. Data preprocessing

Data preview and processing

1. Preview
Insert image description here
Insert image description here
2. Viewing and processing of missing values, outliers, and duplicate values

  • Missing values
    Insert image description here
    ​​The data is relatively complete and there are no missing values.

  • Outliers
    mainly filter out data whose time dimension is outside November 25, 2017 to December 3, 2017

  • Duplicate values:
    Insert image description here
    5 duplicate values ​​in total, perform deduplication processing

    Timestamp conversion and adding converted columns to the original frame

    Insert image description here

    4. Traffic indicator data analysis

    1. Absolute indicator analysis

    Active user definition: users whose daily user behavior is 3 or more times.
    Paying users, pv, and uv are as follows:
    Insert image description here

Insert image description here
Summary: 4 indicators (PV, UV, active users, paying users) all increased significantly on December 2 and December 3 (compared to the 11-25 and 11-26 averages, they increased by 150,000 and 1.3 respectively million, 0.95 million, 0.42 million), it is speculated that the reason is due to the increase in traffic brought about by the warm-up activities before the Double 12 event .

2. Relative indicator analysis

Daily pv per capita: that is, per capita views, daily pv/daily uv;
daily activity rate (approximate): number of daily active users/total uv;
daily paying user ratio: number of daily paying users/daily uv;
per Daily bounce rate: The user/daily UV
indicator that only generates one PV behavior per day is as follows:
Insert image description here
Insert image description here

Summary:
① As can be seen from the figure, the per capita PV fluctuates up and down in these 9 days, ranging from 12 to 14 times, and the data is relatively normal. The daily activity rate increased by a large proportion on December 2 and December 3 (an average increase of about 17pp), which is speculated to be caused by the warm-up activities for the upcoming Double 12 promotion. However, the proportion of paying users has dropped slightly (compared to an average decrease of 1.5pp in the previous two days), and has dropped by about 0.6pp from last weekend (take the difference between this weekend’s average and last weekend’s average). It is speculated that this is due to users warming up. The activity is to select products (additional purchases, collections, etc.), and then place orders while enjoying larger discounts on Double 12. Users tend to wait for 10 days to enjoy lower discounts.
②The daily bounce rate for 9 days fluctuates around 10%-11%, which is relatively stable and low overall, proving that the platform users are very sticky.

5. Analysis of product sales

1. Analyze product categories

The following table shows the number of purchases and overall proportion grouped by product category:
Insert image description here

Summary: By adding up the proportions of the top 20% of the product categories, a total of 811 items (the total category is 4055 items), accounting for approximately 83.95%, which is in line with the 80/20 rule , and the focus should be on the top 20% of the product categories. , and on this basis, subcategories are subdivided and refined operational strategies are adopted.

2. Analyze product purchases

① Descriptive statistics on the number of purchases of products are as follows:
Insert image description here
② The proportion of the number of purchases of the top 20% of products relative to all products:

  • The overall ratio is as follows:

Insert image description here

  • Due to the large data set, the top 20 best-selling products are visualized as follows:
    Insert image description here
    Summary: Among all products, the top-selling product has been sold 71 times. Compared with the total sales of 70,881 times, the proportion is about 1‰. It is the most popular product among all products. , you can formulate differentiated strategies (such as giving more exposure, ranking high, etc.) to further increase its sales.

6. User behavior analysis

1. Group analysis according to different times:

① Different dates:
Insert image description here
Insert image description here
Summary: It can be seen from the trend in the figure that the four types of user behaviors increased significantly on December 2 and December 3. In particular, the two behaviors of pv and additional purchase increased significantly, but the payment behavior increased relatively Smaller, which is consistent with the speculation of the reason for the slight decrease in the proportion of paying users in the fourth part.

② Different time periods (hour dimension):
Insert image description here

Insert image description here
Summary: It can be seen from the user behavior by time period that the main active time period of users is 19:00-23:00, which is in line with the work and rest patterns of the 16-40-year-old user group. The user portrait characteristics can be analyzed for this user group, and the user profile characteristics can also be analyzed. Carry out message push, promotion activities, etc. during user active periods .

2. Analyze by behavior:

① The overall 9-day data and funnel chart are as follows (regarding additional purchases and collections as the same intermediate behavior):
Insert image description here
Insert image description here

Summary: From the funnel chart, we can see that the overall conversion rate is at a relatively good level, and the conversion rate from (cart+pv) to (buy) is as high as 23.58%. Therefore, you can remind users to add purchases or favorites, and add coupons for additional purchases. and other strategies to encourage users to purchase or collect their favorite products and improve the overall conversion rate of purchases.
② Statistics of conversion rates at different levels by date are as follows: Summary
Insert image description here
: As can be seen from the previous section, various indicators (pv, fav, cart, buy) all increased significantly on December 2 and December 3, but from The daily conversion rate statistics chart shows that the conversion rate from additional purchases and collections to purchase behavior has declined . Compared with last weekend (November 25th and November 26th), it has also declined. It is speculated that it may be due to the Double 12 pre-heating activities to attract Users browse, purchase and collect, but purchasing behavior may occur on Double 12 when there are large discounts.

7. RFM model analyzes user importance

注:由于缺少M(金额)列,仅通过R(最近一次购买时间)与F(购买频率)
	对用户进行价值分析

Insert image description here
Summary: Important value users account for the largest proportion, but general development customers rank second. Pay attention to maintaining the proportion of important value users and reducing the proportion of general development customers.
Different operating strategies should be adopted for users with different values:

  • For important value customers , in order to improve the satisfaction of this part of users, differentiated services need to be adopted, the experience of these users should be given the highest priority, their retention rate should be improved, promotions and other benefits should be provided, and activities should be promoted when pushing Be careful not to degrade the user experience.
  • For important retaining customers , they shop frequently but have not made any purchases in the recent period. You can push relevant products of interest based on the recommendation algorithm, push coupons, recall friends and other activities to recall users.
  • For important retained customers , they have recently made purchases, but the shopping frequency is low. You can obtain information through questionnaires, product reviews, and feedback, analyze the areas where users are dissatisfied, make improvements, enhance their experience, and promote repeated purchases.
  • For general development customers , regularly send push notifications or text messages to recall them, and strive to convert them into important retained customers or important retained customers.

8. Summary

1. 20% of users who add purchases and collections are converted into purchasing products, and the conversion rate is high, so corresponding measures should be taken (such as improving the attractiveness of the product details page, receiving coupons for additional purchases or collections, additional purchase reminders after reaching a certain browsing time, etc. Method), increase the user's purchase rate and collection rate , and then increase the user's purchase rate.
2. Event marketing can be carried out according to the date and time period when the user is active (Monday to Friday, 19:00 to 23:00), and push users during this period Products of interest.
3. According to the data of 20% of the product categories accounting for 80%, the main traffic and resources should be tilted towards these products to further increase the birth rate of hot products, but at the same time, attention should be paid to the remaining 80% of the product categories. Potential products should be provided with appropriate traffic support to maintain the overall health of the platform.
4. To stratify users through the RFM model, different marketing methods should be adopted to carry out precise marketing , and limited company resources should be used to prioritize the company's most important customers to maximize profits.

Guess you like

Origin blog.csdn.net/weixin_43195011/article/details/109110831