2022 National University Student Data Analysis Contest A Complete Problem-solving Tutorial and Code Analysis of Pharmaceutical E-commerce Sales Data

Topic A: Complete analysis of pharmaceutical e-commerce sales data to solve the problem
With the gradual opening up of national policies, more and more medicines can be purchased online. The pharmaceutical e-commerce platform is booming. Affected by the new crown epidemic, it is difficult to purchase offline pharmacies. It has also allowed pharmaceutical e-commerce to enter the vision of more consumers, and major pharmaceutical companies have also stepped up their efforts to deploy pharmaceutical e-commerce. However, the e-commerce model is different from offline retail. How to better manage pharmaceutical e-commerce has become an urgent problem for pharmaceutical companies to solve. This question collects Tmall vitamin medicines. Please clean, analyze and mine data on vitamin medicines, and answer the following questions.

2.1 First question

Analyze the stores, how many stores are included in total, and what is the sales ratio of each store? Give the store with the highest proportion of sales, and analyze the sales of the store.

Topic analysis: Calculate the deduplication value and number of stores in the shop_name field, calculate the sales, group according to the stores, and calculate the sales ratio of each store. Then sort in descending order to get the store with the highest sales ratio, and perform data summary analysis on other fields of the store with the highest sales

2.2. The second question

Analyze all drugs, how many drugs are included in total, and what is the sales ratio of each drug? Give the 10 drugs with the highest sales ratio, and draw the monthly sales curve of these 10 drugs.

Topic Analysis: Count the deduplication value and number of drugs in the id field (the title cannot be analyzed, because some products have the same title, but the taste or other information is different, I think it may be better to analyze the id), calculate The sales volume is grouped according to the drugs, and the sales ratio of each drug is calculated. Then sort in descending order, count the 10 drugs with the highest sales ratio, and draw the monthly sales curve of these 10 drugs.

2.3. The third question

Analyze all drug brands, how many brands are included in total, and what is the sales ratio of each brand? Give the 10 brands with the highest sales ratio, and analyze the reasons why these 10 brands sell better?

Topic analysis: Calculate the brand de-duplication value and its number in the brand field, calculate the sales, group according to the brand, and calculate the sales proportion of each brand. Then sort in descending order, and count the 10 drugs with the highest sales ratio. You can analyze the reasons for better sales from the corresponding charts of prices and discounts.

2.3. The fourth question

Predict the total sales of Tmall's vitamin drugs in the next three months and draw a fitting curve to evaluate model performance and errors.

Topic analysis: First, select the corresponding products of vitamins in the Tmall store, group them according to time, get a time series data, and then make a time series prediction, you can use the traditional arima model or gray prediction model, you can use Machine learning xgboost, neural network, or lstm model using deep learning, note that this is fitting, so mape can be used to evaluate model performance

2.3. Fifth question

A pharmaceutical company plans to sell a new vitamin brand online and hires you as a consultant to design an e-commerce business strategy of no more than two pages.

Topic analysis: This can be analyzed based on the data in the third question above

The video of the complete problem-solving process has been released:

2022 National College Student Data Analysis Competition A-question nanny-level tutorial and complete problem-solving code_哔哩哔哩_bilibili

Guess you like

Origin blog.csdn.net/weixin_44099072/article/details/128499478