New Year's Day is coming soon. It's a rare 3-day holiday. I definitely want to play, but where to go is a question. Therefore, Xiaoxiao took Xiamen, a popular tourist city, as an example. She used Python to obtain relevant attraction data from Qunar.com, including attraction names, regions, ratings, sales, prices, coordinates and other fields. She visualized the data and made simple analysis to Looking for a cost-effective attraction.
data collection
Qunar.com data collection is relatively simple. After finding the real URL, construct parameter splicing, use request to request the json data, and store the data as a csv file in append mode.
The core code of the crawler is as follows:
import requests
import random
from time import sleep
import csv
import pandas as pd
from fake_useragent import UserAgent
def get_data(keyword,page):
ua = UserAgent(verify_ssl=False)
headers = {
"User-Agent": ua.random}
url = f'http://piao.qunar.com/ticket/list.json?keyword={keyword}®ion=&from=mpl_search_suggest&page={page}'
res = requests.request("GET", url,headers=headers)
sleep(random.uniform(1, 2))
try:
res_json = res.json()
#print(res_json)
sight_List = res_json['data']['sightList']
print(sight_List)
except:
pass
if __name__ == '__main__':
keyword = "厦门"
for page in range(1,100): #控制页数
print(f"正在提取第{page}页")
sleep(random.uniform(1, 2))
get_data(keyword,page)
data processing
Import related packages
First, import third-party libraries related to data processing and data visualization to facilitate subsequent operations.
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline
plt.rcParams['font.sans-serif'] = ['SimHei'] # 设置加载的字体名
plt.rcParams['axes.unicode_minus'] = False # 解决保存图像是负号'-'显示为方块的问题
import jieba
import re
from pyecharts.charts import *
from pyecharts import options as opts
from pyecharts.globals import ThemeType
import stylecloud
from IPython.display import Image
Import attraction data
Use pandas to read the crawled csv format attraction data and preview it.
df = pd.read_csv("/程序员晓晓Python/旅游/厦门旅游景点.csv",names=['name', 'star', 'score','qunarPrice','saleCount','districts','point','intro'])
df.head()
Remove duplicate data
There is a certain amount of duplicate data on the website that needs to be eliminated.
df = df.drop_duplicates()
View data information
Check the field type and missing values to meet the analysis needs and no additional processing is required.
df.info()
<class 'pandas.core.frame.DataFrame'>
Int64Index: 422 entries, 0 to 423
Data columns (total 8 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 name 422 non-null object
1 star 422 non-null object
2 score 422 non-null float64
3 qunarPrice 422 non-null float64
4 saleCount 422 non-null int64
5 districts 422 non-null object
6 point 422 non-null object
7 intro 377 non-null object
dtypes: float64(2), int64(1), object(5)
memory usage: 29.7+ KB
Descriptive statistics
It can be seen from the descriptive statistics table that after eliminating duplicate data, there are 424 attractions remaining, and the average ticket price is 40 yuan.
color_map = sns.light_palette('orange', as_cmap=True) # light_palette调色板
df.describe().style.background_gradient(color_map)
Will OpenAI jointly open a chip company? Provide reference visual analysis
Attractions
By drawing a word cloud diagram of the introduction text of Xiamen's attractions, we can easily see the characteristics of Xiamen. As a typical coastal leisure city, words such as sailing boats, Gulangyu Island, and yachts are mentioned a lot, and words such as buildings and museums are also mentioned to some extent, reflecting Xiamen's strong cultural atmosphere.
#绘制词云图
text1 = get_cut_words(content_series=df['intro'])
stylecloud.gen_stylecloud(text=' '.join(text1), max_words=100,
collocations=False,
font_path='simhei.ttf',
icon_name='fas fa-heart',
size=653,
#palette='matplotlib.Inferno_9',
output_name='./offer.png')
Image(filename='./xiamen.png')
Distribution of attractions
Use kepler.gl to draw the distribution map of tourist attractions in Xiamen, and use the size of the circle to indicate the monthly ticket sales. We can clearly see that Xiamen’s attractions are concentrated in Siming District and Huli District, and other areas are more distributed. dispersion. Especially in Siming District, ticket sales are far ahead of other districts.
df["lon"] = df["point"].str.split(",",expand=True)[0]
df["lat"] = df["point"].str.split(",",expand=True)[1]
df.to_csv("/程序员晓晓Python/data.csv")
Rated TOP10 attractions
Judging from the ratings of attractions, Xiamen University has the highest rating, with a perfect score of 5. Followed by Gulangyu Island and Nanputuo Temple, with scores of 4.9 and 4.6 respectively. No wonder some people say that if you have never been to Xiamen University and Gulangyu Island, you have never been to Xiamen.
df_score = df.pivot_table(index='name',values='score')
df_score.sort_values('score',inplace=True,ascending=False)
df_score[:10]
Monthly sales top 10 attractions
In terms of monthly ticket sales, Gulangyu ranks first with monthly sales of 1,230, followed by Xiamen Garden and Botanical Garden and Gulangyu round-trip ferry. Xiamen Fangte Dream Kingdom also has a monthly sales volume of more than 600.
df_saleCount = df.pivot_table(index='name',values='saleCount')
df_saleCount.sort_values('saleCount',inplace=True,ascending=False)
df_saleCount[:10]
Price TOP20 Attractions
Judging from the price of attractions, activities such as yachting, helicopters, and sailing boats are relatively expensive. In addition, Xiamen Fangte is not cheap. If you are not sensitive to price, you can consider it. If you are traveling on a budget, you can avoid it in advance.
df_qunarPrice = df.pivot_table(index='name',values='qunarPrice')
df_qunarPrice.sort_values('qunarPrice',inplace=True,ascending=False)
df_qunarPrice[:20]
Top 20 attractions with monthly sales
Since the change in sales volume of Xiamen's attractions in the past month is smaller than the change in price, sales are more affected by price. It can also be seen from the figure below that the attractions with the largest monthly sales are still yachts, Fantawild and the like.
df["saleTotal"] = df["qunarPrice"]*df["saleCount"]
df_saleTotal = df.pivot_table(index='name',values='saleTotal')
df_saleTotal.sort_values('saleTotal',inplace=True,ascending=False)
df_saleTotal[:20]
Attraction level distribution
Judging from the distribution of tourist attraction levels in Xiamen, less than 5% of tourist attractions are rated 3A or above.
df_star = df["star"].value_counts()
df_star = df_star.sort_values(ascending=False)
#print(df_star)
c = (
Pie(init_opts=opts.InitOpts(theme=ThemeType.WALDEN))
.add(
"",
[list(z) for z in zip(df_star.index.to_list(),df_star.to_list())]
)
.set_global_opts(legend_opts = opts.LegendOpts(is_show = False),title_opts=opts.TitleOpts(title="景点等级分布",subtitle="数据来源:去哪儿网\n制图:程序员晓晓Python",pos_top="0.5%",pos_left = 'left'))
.set_series_opts(label_opts=opts.LabelOpts(formatter="{b}:{d}%",font_size=16))
)
c.render_notebook()
df[df["star"]!='无'].sort_values("star",ascending=False)
The following are some of the selected 3A and above attractions:
summary
Through the above simple analysis, we can roughly get the following inspirations:
1. Xiamen is a typical coastal leisure city with rich marine and cultural landscapes;
2. Xiamen tourist attractions are mainly concentrated in Siming District, and are relatively scattered in other areas;
3. Xiamen University has the highest reputation, followed by Gulangyu Island;
4. Gulangyu Island ticket sales are far ahead of other attractions in Xiamen;
5. High-cost attractions or activities include yachts, sailing boats and Fantawild.
Interested friends will receive a complete set of Python learning materials, including interview questions, resume information, etc. See below for details.
1. Learning routes in all directions of Python
The technical points in all directions of Python have been compiled to form a summary of knowledge points in various fields. Its usefulness is that you can find corresponding learning resources according to the following knowledge points to ensure that you learn more comprehensively.
2. Essential development tools for Python
The tools have been organized for you, and you can get started directly after installation!
3. Latest Python study notes
When I learn a certain basic and have my own understanding ability, I will read some books or handwritten notes compiled by my seniors. These notes record their understanding of some technical points in detail. These understandings are relatively unique and can be learned. to a different way of thinking.
4. Python video collection
Watch a comprehensive zero-based learning video. Watching videos is the fastest and most effective way to learn. It is easy to get started by following the teacher's ideas in the video, from basic to in-depth.
5. Practical cases
What you learn on paper is ultimately shallow. You must learn to type along with the video and practice it in order to apply what you have learned into practice. At this time, you can learn from some practical cases.
6. Interview Guide