Travel information crawling and data analysis based on Python

收藏和点赞,您的关注是我创作的动力

overview

  With the development of computer network technology, new programming languages ​​have emerged in an endless stream in recent years. Python language is the most popular language in recent years. Python language is more convenient than other high-level languages. Practical modules and libraries have the characteristics of simple syntax and clear statements, making it simpler and easier to use in code programming. In addition, python is particularly widely used. As a language with wide application, whether it is a game From development to data crawling to website construction, Python can be easily controlled. Among them, the application of crawlers has made the language Python even more famous.
As one of the components of Internet search engines, crawlers can effectively search and crawl useful information for us, reducing manual operations. It is very convenient and can be collected under self-defined conditions. For information on certain web pages, such as house prices, stocks, recruitment information, etc., we can process the data to get the information we need.
This article uses python to implement a crawler information collection for the Mafengwo tourism website, and analyzes and processes the data collected from the Mafengwo website to obtain the desired data.

Keywords: Python Html crawler tourism hornet's nest

1. Research background and significance

  With the rapid development of the Internet in recent years, the Internet has entered the big information age. The information on the Internet has exploded and a variety of information is displayed, which makes people look for the information they need on the Internet. , it seems more and more difficult. Of course, when a problem arises, there must be a way to deal with it. Corresponding to the dramatic increase in information is the emergence of search engines, such as Google, Baidu, etc. Search engines collect information from the Internet tens of thousands of different types of web page information and index it. Through search engines, even if there are many types of information on the Internet, we can still search for web pages with corresponding information through keyword searches. .
  Web crawler is an automated program and a component of a search engine. Different search engines can choose appropriate crawler methods to collect information on the Internet based on different search requirements. Traditional web crawlers mainly start from a URL, crawl the URL of the target web page, observe its structural characteristics, construct a new URL according to the structural rules, continuously put the new URL into the queue, crawl in a loop, and finally until until the requirements are fulfilled. Excellent and efficient crawler programs can enable people to find more accurate information on the Internet.
  This article uses Python language to implement an information collection and analysis of the Mafengwo tourism website. By crawling the city numbers of the Mafengwo tourist cities, according to the URL rules of the Mafengwo website, the Mafengwo tourism website is obtained through splicing. Based on the obtained URL, enter the Mafengwo tourism city page, observe the page structure, locate the page through tags, crawl the page information we need, save it into a local file, and then process the data in the file. , visual analysis, telling you where to go when traveling.

2. Design analysis

  First of all, for tourism, we need to determine a city first. Therefore, the first thing we ask for is what are the top 10 popular cities among Mafengwo tourist cities. Secondly, we need to get the attraction data of each city and draw the conclusion based on the data analysis. What are the top 15 popular attractions? Finally, crawl information related to food to get a ranking of the top 15 most representative food in tourism.

1 Obtaining city number

  First of all, when we crawl tourism information, we must crawl tourism information from many cities. Different cities have different URLs in the Mafengwo tourism website. However, through comparison, we can find that in the Mafengwo tourism website, all cities and City attraction information is composed of specific five-digit or six-digit numbers. This is a breakthrough for us to crawl tourist information of different cities. Based on this number, we can splice different URL addresses of different cities, and get City tourism interface.

2 Crawling of city information

  After getting the city number, we can get the URL address of the city on Mafengwo Tourism website. Through the address, we can enter the city tourism interface. At this time, we need to consider what we should crawl and what information we should crawl. Which information is useful and can support the credibility of the data analysis after we crawl the information? Here, we are based on the number of travel notes of different cities in the Mafengwo travel website, the number of impression tags, the ranking of special food, shopping and entertainment Information derived from rankings and so on.

3 Processing of crawled information

  After we get the specific tourism information of the city, the last step is the visual processing of the data. First, I
need to list the top 10 popular cities and the top 15 cities on the Mafengwo travel website. The popular cities with attraction tags, the top 15 popular cities with food and beverage tags, and the top 15 popular cities with entertainment and shopping tags are visualized and displayed using a bar chart. Next, we visualized the top 15 most popular city attractions, the top 15 most popular city delicacies for dining, and the top 15 most popular entertainment and shopping cities. We also used histogram display. Finally, we display the heat map of the top 20 popular cities. This is the visualization of all information processing.
Insert image description here

Figure 3.3 Program flow chart

3. Project realization

  

1 Visual picture display

Insert image description here

Figure 4.4.1 Mafengwo National Tourism Travel Notes Top 10
Insert image description here

Figure 4.4.2 Mafengwo National Tourist Attractions Label Top 15
Insert image description here

Figure 4.4.3 Mafengwo’s top 15 national tourism and catering labels

Insert image description here

Figure 4.4.4 Mafengwo’s top 15 national tourism, shopping and entertainment labels
Insert image description here

Figure 4.4.5 Mafengwo National Popularity Ranking of Tourist Attractions Top 15

Insert image description here

Figure 4.4.6 Mafengwo’s top 15 national tourism catering popularity rankings
Insert image description here

Figure 4.4.7 Mafengwo’s top 15 national tourism, entertainment and shopping popularity rankings

Insert image description here

Figure 4.4.8 Mafengwo National Tourism Heat Map TOP30

  From the visual pictures, we can know that Hulunbuir is the place where the most people travel in Mafengwo Tourism. Hulunbuir is often called the prairie. Hulunbuir is located in Inner Mongolia. In the heat map display, we can see that It is clear that the northern part of China has darker colors. Hulunbuir is also a summer resort. In the summer season, the weather there is indeed very cool, which is very suitable for people to travel to escape the summer heat and enjoy the scenery of the prairie on horseback. As the capital, Beijing is of course also a tourist destination for many people. It has many tourist attractions, such as the Forbidden City, the Great Wall, the Summer Palace, etc., which are all famous world heritage sites. If you want to see the glorious history of the Chinese nation, Beijing is a good tourist destination.
  Next, let’s introduce the tourist city of Xiamen. From the data, we can see that Xiamen is the city where most people go to for tourist attractions, and Xiamen is the city where most people go to for restaurants. It can be seen that Xiamen is also a tourist destination in the minds of many people. First of all, Xiamen is located in the coastal area, so the weather here is mild in winter and there is no heat in summer. This is a very good condition for tourism. However, because of the coastal area, everyone When traveling, remember to avoid the typhoon weather in summer. Secondly, the coastal scenery is very attractive. For people in inland areas who have never been to the beach, this is also a big attraction. Next, there is the food. As the economy has developed to this day, while people's living standards have improved, their requirements for food have also become higher and higher. Not only must they be full, but they must also eat well. And Xiamen has countless delicious food. In the data of the top 15 restaurants, we You can find that Xiamen has six unique delicacies, such as sand tea noodles, fried oysters, bamboo shoot jelly, peanut soup, etc., all of which tempt the taste buds of foodies. The last thing is the price. Xiamen’s tourism prices are still Compared with the previous Hulunbuir Prairie and Beijing, Xiamen is an affordable tourist destination.
  In terms of entertainment labels, Lijiang is the most popular tourist destination. The scenery of small bridges and flowing water with people’s houses and the snow-capped Jade Dragon Snow Mountain are both very good tourist destinations. However, due to the lack of Lijiang bars in recent years, The incident of the woman has stained the reputation of Lijiang tourism, so when traveling in Lijiang, you must distinguish between true and false, and have a keen eye to distinguish bar routines, so as not to fall into the trap.
Finally, according to the heat map, we can clearly see that the southern region and coastal areas are relatively popular tourist destinations. It seems that many people still like the southern food and pleasant weather and climate. As well as the coastal scenery, based on these data, have you figured out where you want to travel?

4. Summary

  Through this graduation project, I once again felt the charm of the Python programming language. Its simple and easy-to-understand code and rich libraries left a deep impression on me, making simple operations possible. It plays a complex role and makes people addicted to it. Of course, in the process of realizing the graduation project, we also encountered many difficulties. Sometimes when looking for page rules, we were often stuck there for a long time without making any progress, which made people unable to start, which greatly slowed down the progress of completing the graduation project. , at this time, my classmates and tutor Wu Ruiran will help me point out the direction. Discussions among classmates. Different people have different ways of thinking and have different opinions. Most of the time, they can help me find another way to achieve my goals. , which benefited me a lot. Teacher Wu Ruiran will guide me how to think and solve this difficulty. Here I would like to thank Teacher Wu Ruiran for helping me and for the information reference and suggestions given by the teacher.
  This graduation project also taught me a lot of knowledge that I didn’t know before, such as the use of python libraries. Some libraries were my first time to use them, which once again increased my code knowledge reserves. It also cultivated my ability to complete tasks independently and established my own self-confidence. I believe that I will be able to overcome obstacles, go further, and learn more in my future programming journey.

6. Catalog

Table of Contents
Chinese Abstract 1
Abstract 2
Chapter 1 Introduction 4
1.1 Research background and significance of the topic 4
1.2 Research status at home and abroad 5
1.3 Research content 6
1.4 Paper structure 6
Chapter 2 Basic Theory of Deep Learning 8
2.1 Mathematical Model of Neurons 8
2.2 Multi-layer Forward Neural Network 9
2.3 Deep neural network 10
2.4 Learning method of neural network 11
Chapter 3 Verification code image processing technology 12< /span> Chapter 4 Design and Implementation of Convolutional Neural Network 16< /span> Acknowledgments 35 References 32 6.2 Shortcomings and prospects 30 6.1 Summary of work 30 Summary of Chapter 6 30 5.4 The impact of learning rate on network performance 28 5.3 The impact of the number of hidden layer neurons on network performance 26 5.2 Recognition results 25 5.1 Parameter selection 25 Chapter 5 Network Performance Analysis 25 4.3.3 Backpropagation of convolutional layer 24 4.3.2 Backpropagation of pooling layer 23 4.3 .1 Backpropagation of fully connected layer 22 4.3 Error back propagation 21 4.2 Network initialization 19 4.1 Network structure 16 3.2 Character positioning and segmentation 13
3.1 Image preprocessing 12

















Guess you like

Origin blog.csdn.net/m0_73485263/article/details/133418379