Stop studying the spike Moutai, the guy uses reptiles to pick up and buy Mercedes-Benz!

In recent years, the "second-hand economy" has become increasingly hot, and the second-hand car market is also expanding rapidly.

 

For the same model, second-hand cars are much more affordable than new cars. For example, the Mercedes-Benz GLC class in the picture below, second-hand cars can be 50,000-200,000 cheaper than new cars . Therefore, more and more people take second-hand cars into consideration when buying vehicles.

 

image

 

But as we all know, the water in the second-hand market is relatively deep, and it is easy to pay the "IQ tax" if you are not careful. Therefore, it is essential to have a certain understanding of the market before buying a second-hand car.

 

Today I brought you a practical project of a second-hand car website, using Python to analyze the second-hand car market .

 

image

One, clear needs

image

 

1. Crawling information about Mercedes-Benz GLC-class sedan from a used car website (title, purchase year, mileage, price)

2. Analyze the information on the insured rate of second-hand cars by using years and mileage

 

image

Second, crawl data

image

 

Before we start crawling the data, we first determine the tool to be used, that is, the library. At present, there are several ways to write crawlers in Python:

 

image

 

After selecting the tools according to your needs, you can start crawling data.

 

First, the crawler will download the data of the webpage according to our instructions, and then use the xpath expression to extract the content we need from the webpage data. That is, the title, year, mileage, price and other information of each used car. (Remember to write a cycle based on the number of used car information on the page!)

 

image

 

 

image

Three, data cleaning

image

 

 

What is data cleaning? Data cleaning is a process of re-examining and verifying data. The purpose is to remove duplicate information, correct existing errors, and provide data consistency.

 

Just like our example, there are spaces in the crawled title and "|" in the subtitle. We need to divide the different data and delete the words "year" in the year and "10,000 kilometers" after the mileage. Only pure data computers can calculate.

 

Finally, use the Pandas library to output as a csv file.

 

image

 

Is this kind of data much more pleasing to the eye?

 

 

Four, data visualization

 

After obtaining the standardized data in csv format, we can analyze the data in an intuitive way and discover the trends and characteristics of the data.

image

 

As shown in the figure, the dot matrix chart on the left can clearly see that the earlier the purchase year of the car, the price will gather in the lower range; while on the right, we can see that the mileage and the price are negatively correlated.

 

image

Five, summary process

image

 

image

 

 

We learn Python, especially learning data analysis, and cannot do without a lot of practical business training. Here is a free practical training way for everyone. QQ group: 721195303

Guess you like

Origin blog.csdn.net/aaahtml/article/details/114251698