Python crawls a treasure pineapple data and visually analyzes sales

The text and pictures in this article are from the Internet and are for learning and communication purposes only, and do not have any commercial use. If you have any questions, please contact us for processing.

The following article comes from Zhibin’s python notes, the author Zhibin

Python crawler, data analysis, website development and other case tutorial videos are free to watch online

https://space.bilibili.com/523606542 

Python learning exchange group: 1039645993

Preface

If you want to talk about which kind of fruit is the hottest recently, you have to talk about pineapple. With the topic "One out of every three Chinese pineapples comes from Xu Wen", it has been hotly searched on all major platforms. Xuwen pineapple has quickly become a hot commodity in the consumer market.

With the opening of the Xuwen pineapple high-speed rail, the transportation cost and time cost of pineapples have been greatly reduced, which means that we can eat fresher pineapples at a lower price. In this case, would you still worry that online shopping is not new?

 

data collection

This article uses Python to collect in detail the sales data of 1774 pineapples from Taobao.com, and obtains the store name, product name, price, origin, sales volume and other data of pineapple. Since the previous article has introduced the Taobao merchant data collection method in detail (not Understand this article and use Requests+Cookie to easily get Taobao product data!), so here we directly upload the code:

response = requests.get('https://s.taobao.com/search', headers=headers, params=params)

shangpinming = re.findall('"raw_title":"(.*?)"',response.text)
jiage = re.findall('"view_price":"(.*?)"',response.text)
fahuodi = re.findall('"item_loc":"(.*?)"',response.text)
fukuanrenshu = re.findall('"view_sales":"(.*?)人付款"',response.text)
dianpumingcheng = re.findall('"nick":"(.*?)"',response.text)

 

data processing

We opened the Excel file to observe the data and found that there are many duplicate data, as shown in the figure:

Python crawls a treasure pineapple data and visually analyzes sales

 

It may be caused by the existence of some store data in different pages. We can use pandas to clean the data, but here we can use a simpler way to clean the repeated data, that is, Excel, which comes with There is the function of deleting duplicate items, as shown in the figure:

Python crawls a treasure pineapple data and visually analyzes sales

 

After data processing, data preview:

Python crawls a treasure pineapple data and visually analyzes sales

 

data visualization

This article uses Excel to visualize pineapple data, because Excel is even better than Python in terms of drawing!

Pineapple price distribution map

Python crawls a treasure pineapple data and visually analyzes sales

 

According to the figure, 45% of the price of pineapples is below 30 yuan, and most of them are below 100 yuan. According to the national per capita disposable income announced by the National Bureau of Statistics, it is quite simple to realize pineapple freedom.

Python crawls a treasure pineapple data and visually analyzes sales

 

Those stores have better sales

Python crawls a treasure pineapple data and visually analyzes sales

 

From the figure, we can see that 9 of the top ten stores are flagship stores. It seems that when prices are lower, people pay more attention to product quality.

 

The relationship between price and sales

Python crawls a treasure pineapple data and visually analyzes sales

 

From the scatter chart, we can see that prices and sales are basically inversely proportional, that is, the lower the price, the higher the sales.

Three points are higher, which may be due to the reputation of the store.

 

Where is pineapple abundant in China

Python crawls a treasure pineapple data and visually analyzes sales

 

I visualized the location of Taobao stores and found that most of the stores are concentrated in coastal areas such as Guangdong, Hainan, and Zhejiang. I specifically searched the Internet for the conditions of pineapple production:

Python crawls a treasure pineapple data and visually analyzes sales

 

The characteristics of pineapples on sale

Python crawls a treasure pineapple data and visually analyzes sales

 

We made all the product names into a word cloud map. From the word cloud map, we can see that the keywords of the pineapple product data are: fresh, pineapple, canned food, snacks, and Hainan. FCL, free shipping, etc.

Guess you like

Origin blog.csdn.net/m0_48405781/article/details/115006265