Maternal and infant product sales analysis (with Python source code and Tableau files)

Maternal and infant product sales analysis (with Python source code and Tableau files)

Source link of this project: Baidu cloud disk extraction code 6zdz
Data source: Ali Tianchi

In order to reduce the space, this article will match the source code as little as possible, and there are detailed comments in the file.
This case combines Python and Tableau. Due to the small amount of data and dimensions, Tableau is used for visualization.

Project Introduction

Project Background

According to the PEST framework, a brief analysis from four perspectives:

  • Policy Politics: The National Development and Reform Commission said on May 28, 2013 that 13 departments will introduce a series of policy measures to support e-commerce development in five aspects, including trusted transactions, mobile payments, online electronic invoices, commercial circulation and logistics distribution. It is conducive to promoting the rapid development of the online maternal and infant commodity market.
  • Economy: With the steady growth of the domestic economy, the disposable income of urban residents in China increased to 31,195 yuan in 2015, while the disposable income of rural residents increased to 11,422 yuan in the same period. Increasing per capita disposable income will increase households' willingness to consume. In 2015, the market size of China's mother and baby industry is expected to reach 2 trillion.
  • Social Society: For the first-tier city residents, the new consumption method that can complete shopping at any time and anywhere through mobile phones, computers and other electronic network devices can better adapt to their compact life rhythm; for second- and third-tier cities and rural residents, logistics The convenience brought by home delivery also makes online shopping more attractive.
  • Technology: The popularity of 4G networks, the rapid upgrade and iteration of mobile devices such as mobile phones and ipads, and the development of online payment systems have injected a strong impetus into the rapid rise of e-commerce.

Analysis purpose

  1. Help online merchants make different sales and operation management strategies for different time nodes and scenarios, help merchants increase sales and turnover, and reduce operating costs.
  2. According to the child's information (age, gender, etc.) to predict what kind of products the user will buy. (Not yet completed)

Problem solving

Insert picture description here

Data overview

Ali_Mum_Baby is a data set containing more than 9 million children's information (birthday and gender), provided by consumers, they share this information in order to get better recommendations or search results.
There are two csvs in this data.
Baby Information Form

Column Description
user_id user id
birthday children’s birthday
gender 0-female,1-male,2-unknown

Transaction Record Form

Column Description
item_id item id
user_id user id
cat_id category id
cat1 root category id
propery property of the corresponding item
buy_mount purchase quantity
day timestamp

data preparation

Import Data

baby = pd.read_csv("./sam_tianchi_mum_baby.csv")
trade =pd.read_csv("./sam_tianchi_mum_baby_trade_history.csv")

Overview data

  • The baby table has only 3 dimensions, a total of 953 rows of data, and no missing values.
  • The trade table has 7 dimensions and a total of 29,971 rows of data with no missing values.
  • trade.property is a commodity property, because all are numeric strings, so we delete first.
    Insert picture description here
    Insert picture description here
    buy_mount in the trade table is an important label we are concerned about. From the descriptive statistics and images, the average value of this data is 2.5, the standard deviation is 64, and there are abnormal values, so we save the data within 3 standard deviations from the mean, that is [0,195].
    Insert picture description here

Data cleaning

  1. Check for missing value outliers and deal with them.
  2. What attribute is not specified in the trade_id in trade, we will change it to item_id by default.
  3. The properties in the data set are all numbers. You need to have a corresponding dictionary to know what the corresponding property is, and delete it first.
  4. Change day to date form.
# 根据info()查看,本数据集无缺失值
# 列重命名
trade.rename({"auction_id":"item_id"},axis=1,inplace=True)
# 先将property暂且取出放在一边,后续再分析
property = trade.property
trade.drop('property',axis=1,inplace=True)
# 日期类型转换
baby['birthday']=pd.to_datetime(baby.birthday.astype('str'))
trade['day'] = pd.to_datetime(trade.day.astype('str'))

After cleaning, the data remains at 29,942 lines. The statistical time is 2012/7 / 2-2015 / 2/5. In the entire data set, there are 6 product categories, 662 product categories, 28394 products, and 29915 users.
Insert picture description here

This is data that has been cut down for more than two years. Due to the lack of some data, we can only analyze based on this data set. The main idea.

data analysis

Overall market situation

Insert picture description here
From 2017/7 to 2015/2, the total sales volume was 49,973. From the above chart, we can see that the overall sales volume of the maternal and infant products market on Taobao and Tmall platforms has shown an upward trend, but the fluctuation is relatively large.
Insert picture description here

  1. Due to the lack of data in 2015, it cannot reflect the actual sales in the first quarter of 2015
  2. In the first quarter of each year, sales will show a certain decline. Sales in the fourth quarter of each year will show a substantial increase.

Insert picture description here

  1. Sales in both the first quarter of 2013 and 2014 declined, mainly in January and February.
  2. Every year in May and November there will be different levels of sales growth.

Reasons for the decline in sales in the first quarter

Assume that the reason for the decline in the first quarter is related to the Spring Festival.
Insert picture description here

  • 2013/2 / 1-2013 / 2/15 is at the bottom of sales, 2013 Spring Festival holiday: 2013/2 / 9-2013 / 2/15
  • 2014/1 / 26-2014 / 2/4 is at the bottom of sales, 2014 Spring Festival holiday: 2014/1 / 31-2014 / 2/6

The 2015 Spring Festival holiday is 2015/2 / 18-2015 / 2/24, the data set statistics time is only until 2015/2/5, so we will not analyze the situation in the first quarter of 2015

Near the Spring Festival, some companies may have early vacations, express delivery is suspended, and the sales trough period basically coincides with the Spring Festival holiday. After the end of the holiday, purchases and users increase, so it can be considered that the decline in sales in the first quarter was caused by the Spring Festival holiday.

Reasons for the rise in sales in the fourth quarter

The assumption is related to the double eleven double twelve activities.
Insert picture description here

  1. It can be clearly seen that the sales volume and sales volume on 2013 and 2014 double eleven and double twelve surged.
  2. The number of users and sales volume of Double Eleven events each year are more than in previous years, and the number of users increased by 75% -80%.

Therefore, it can be considered that the sales increase in the fourth quarter of each year has a great relationship with the double eleven double twelve events.

Repurchase rate

Insert picture description here
Insert picture description here
The monthly product repurchase rate is extremely low. The repurchase rate of each major category is also extremely low, none of which exceeds 1%, of which the repurchase rate of 38 major categories is the highest at 0.17%. Considering that the user's single purchase volume is mostly one piece, and the repurchase rate is low, it means that the user's desire to repurchase a single product is extremely low, and the merchant should consider it from a product perspective, such as product quality and shopping experience.

Commodity sales

Insert picture description here
Category 28 and 50008168 have the best sales. Category 38 has low sales volume and the least number of sub-categories, but the per capita purchase volume is very high. The demand for the product is very strong, and it can increase the sub-category products under the large category 38 in an appropriate amount to increase the sales volume.
12265008 product sales and per capita demand are not high. This shows that the user's low demand for these products, it is recommended to reduce the purchase to avoid inventory backlog.

Baby situation


After connecting the two tables inner, it is found that there is a baby in 1984, which is obviously an outlier, and we want to eliminate it.
Insert picture description here
Since the data is calculated to 2015/2, we assume that the analysis date is 2015/3. Among the users who buy maternal and infant products, the age of the baby is mainly concentrated at 0-3 years old.
Insert picture description here
47.1% of user households purchasing maternal and infant products are male infants and 52.9% are female infants.
Insert picture description here
We divided the ages of infants into unborn, infancy (0-12 months), early childhood (1-3 years), preschool (3-7 years), and school age (7+).

According to the above picture, we can easily see the popular categories of babies at various stages:

  • Unborn: 50014815, 50022520, 5008168, 28
  • Infancy: 50014815, 50022520, 5008168, 28
  • Early childhood: 50014815, 50008168, 28
  • Preschool age: 50008168, 28
  • School age: 50008168

As the age of babies grows, the demand for products in the 50008168 category gradually increases while the demand for products in the 50014815 category gradually decreases.
Insert picture description here
The demand for commodities for baby girl households is obviously greater than that for male households. Let's take a look at the commodity categories.
Insert picture description here
It can be seen that 71.05% of the sales records of 50018831 commodities under the major category 50014815 were purchased by families of baby girls. In the purchase record, there is no shortage of cases where the family purchase of some high-selling products is 100%.

to sum up

Product sales

  1. The sales of maternal and infant products are increasing year by year, but the monthly fluctuations are relatively large.
  2. Affected by the Spring Festival every year, sales in the first quarter will show a year-round low; under the promotion of double eleven and double twelve events, sales in the fourth quarter will reach the annual peak.
  3. The user repurchase rate is extremely low and needs to be considered and improved in terms of product quality, price and purchase experience.
  4. 50014815, 50008168 and 28 are the top-selling TOP3
  5. Although the category 38 has low sales volume, the per capita user purchases are very large. You can consider adding sub-categories under this category to increase user choices and increase sales.

User portrait

  1. The demand of users in early childhood (1-3 years old) is the largest. As the age of babies grows, the demand for maternal and infant products gradually decreases.
  2. The proportion of male and female infant households is close, but the purchase volume of female infant households is significantly larger than that of male infant households.
  3. The proportion of female babies who buy some commodities is significantly larger than that of male babies. This category of commodities can be further changed into a baby girl to encourage more baby girl families to buy.

Suggest

  1. One week before the Spring Festival, we should reduce the investment in product promotion, reduce the purchase volume, and retain low-level inventory; the double eleven and double twelve warm-up phases need to be strengthened to promote, enrich operational activities, and attract more passenger traffic. At the same time, we must increase product inventory to ensure a stable supply of goods. It is necessary to increase customer service personnel and contact logistics in a timely manner to ensure that they can answer user queries in a timely manner and improve shipping efficiency and improve the user purchasing experience.
  2. The product repurchase rate is low. It is necessary to strengthen the return visit to the purchased users, analyze the reasons for not buying back, and improve these factors.
  3. The purchase volume of female baby families is higher than that of male baby families. It is recommended to promote more products specially designed for baby boys to increase the purchase volume of male baby families.
  4. To expand the sub-category products under each major category, especially the major category 38, increase user choices, increase the sales of sub-category products, and then increase the sales of major categories.
  5. Reduce the purchase of products under the 12265008 category to avoid inventory backlog.

references

  1. In-depth analysis of e-commerce sales of maternal and infant products

personal opinion for reference only

Published 118 original articles · praised 817 · 110,000 views

Guess you like

Origin blog.csdn.net/weixin_41261833/article/details/105673282