Where to go on New Year’s Day? Python analyzes popular tourist cities and which attractions are more cost-effective

New Year's Day is coming soon. It's a rare 3-day holiday. I definitely want to play, but where to go is a question. Therefore, Xiaoxiao took Xiamen, a popular tourist city, as an example. She used Python to obtain relevant attraction data from Qunar.com, including attraction names, regions, ratings, sales, prices, coordinates and other fields. She visualized the data and made simple analysis to Looking for a cost-effective attraction.

data collection

Qunar.com data collection is relatively simple. After finding the real URL, construct parameter splicing, use request to request the json data, and store the data as a csv file in append mode.

The core code of the crawler is as follows:

import requests
import random
from time import sleep
import csv
import pandas as pd
from fake_useragent import UserAgent

def get_data(keyword,page):
    ua = UserAgent(verify_ssl=False)
    headers = {
    
    "User-Agent": ua.random}
    url = f'http://piao.qunar.com/ticket/list.json?keyword={keyword}&region=&from=mpl_search_suggest&page={page}'
    res = requests.request("GET", url,headers=headers)
    sleep(random.uniform(1, 2))
    try:
        res_json = res.json()
        #print(res_json)
        sight_List = res_json['data']['sightList']
        print(sight_List)
    except:
        pass

if __name__ == '__main__':
    keyword = "厦门"
    for page in range(1,100): #控制页数
        print(f"正在提取第{page}页")
        sleep(random.uniform(1, 2))
        get_data(keyword,page)

data processing

Import related packages

First, import third-party libraries related to data processing and data visualization to facilitate subsequent operations.

import pandas as pd 
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline
plt.rcParams['font.sans-serif'] = ['SimHei']  # 设置加载的字体名
plt.rcParams['axes.unicode_minus'] = False   # 解决保存图像是负号'-'显示为方块的问题 
import jieba
import re
from pyecharts.charts import *
from pyecharts import options as opts 
from pyecharts.globals import ThemeType  
import stylecloud
from IPython.display import Image

Import attraction data

Use pandas to read the crawled csv format attraction data and preview it.

df = pd.read_csv("/程序员晓晓Python/旅游/厦门旅游景点.csv",names=['name', 'star', 'score','qunarPrice','saleCount','districts','point','intro'])
df.head()

Remove duplicate data

There is a certain amount of duplicate data on the website that needs to be eliminated.

df = df.drop_duplicates()

View data information

Check the field type and missing values ​​to meet the analysis needs and no additional processing is required.

df.info()

    <class 'pandas.core.frame.DataFrame'>
    Int64Index: 422 entries, 0 to 423
    Data columns (total 8 columns):
     #   Column      Non-Null Count  Dtype  
    ---  ------      --------------  -----  
     0   name        422 non-null    object 
     1   star        422 non-null    object 
     2   score       422 non-null    float64
     3   qunarPrice  422 non-null    float64
     4   saleCount   422 non-null    int64  
     5   districts   422 non-null    object 
     6   point       422 non-null    object 
     7   intro       377 non-null    object 
    dtypes: float64(2), int64(1), object(5)
    memory usage: 29.7+ KB

Descriptive statistics

It can be seen from the descriptive statistics table that after eliminating duplicate data, there are 424 attractions remaining, and the average ticket price is 40 yuan.

color_map = sns.light_palette('orange', as_cmap=True)  # light_palette调色板
df.describe().style.background_gradient(color_map)

Will OpenAI jointly open a chip company? Provide reference visual analysis

Attractions

By drawing a word cloud diagram of the introduction text of Xiamen's attractions, we can easily see the characteristics of Xiamen. As a typical coastal leisure city, words such as sailing boats, Gulangyu Island, and yachts are mentioned a lot, and words such as buildings and museums are also mentioned to some extent, reflecting Xiamen's strong cultural atmosphere.

#绘制词云图
text1 = get_cut_words(content_series=df['intro'])
stylecloud.gen_stylecloud(text=' '.join(text1), max_words=100,
                          collocations=False,
                          font_path='simhei.ttf',
                          icon_name='fas fa-heart',
                          size=653,
                          #palette='matplotlib.Inferno_9',
                          output_name='./offer.png')
Image(filename='./xiamen.png')

Distribution of attractions

Use kepler.gl to draw the distribution map of tourist attractions in Xiamen, and use the size of the circle to indicate the monthly ticket sales. We can clearly see that Xiamen’s attractions are concentrated in Siming District and Huli District, and other areas are more distributed. dispersion. Especially in Siming District, ticket sales are far ahead of other districts.

df["lon"] = df["point"].str.split(",",expand=True)[0] 
df["lat"] = df["point"].str.split(",",expand=True)[1] 
df.to_csv("/程序员晓晓Python/data.csv")

Rated TOP10 attractions

Judging from the ratings of attractions, Xiamen University has the highest rating, with a perfect score of 5. Followed by Gulangyu Island and Nanputuo Temple, with scores of 4.9 and 4.6 respectively. No wonder some people say that if you have never been to Xiamen University and Gulangyu Island, you have never been to Xiamen.

df_score = df.pivot_table(index='name',values='score')
df_score.sort_values('score',inplace=True,ascending=False)
df_score[:10]

Monthly sales top 10 attractions

In terms of monthly ticket sales, Gulangyu ranks first with monthly sales of 1,230, followed by Xiamen Garden and Botanical Garden and Gulangyu round-trip ferry. Xiamen Fangte Dream Kingdom also has a monthly sales volume of more than 600.

df_saleCount = df.pivot_table(index='name',values='saleCount')
df_saleCount.sort_values('saleCount',inplace=True,ascending=False)
df_saleCount[:10]

Price TOP20 Attractions

Judging from the price of attractions, activities such as yachting, helicopters, and sailing boats are relatively expensive. In addition, Xiamen Fangte is not cheap. If you are not sensitive to price, you can consider it. If you are traveling on a budget, you can avoid it in advance.

df_qunarPrice = df.pivot_table(index='name',values='qunarPrice')
df_qunarPrice.sort_values('qunarPrice',inplace=True,ascending=False)
df_qunarPrice[:20]

Top 20 attractions with monthly sales

Since the change in sales volume of Xiamen's attractions in the past month is smaller than the change in price, sales are more affected by price. It can also be seen from the figure below that the attractions with the largest monthly sales are still yachts, Fantawild and the like.

df["saleTotal"] = df["qunarPrice"]*df["saleCount"]
df_saleTotal = df.pivot_table(index='name',values='saleTotal')
df_saleTotal.sort_values('saleTotal',inplace=True,ascending=False)
df_saleTotal[:20]

Attraction level distribution

Judging from the distribution of tourist attraction levels in Xiamen, less than 5% of tourist attractions are rated 3A or above.

df_star = df["star"].value_counts()
df_star = df_star.sort_values(ascending=False)
#print(df_star)
c = (
        Pie(init_opts=opts.InitOpts(theme=ThemeType.WALDEN))
        .add(
            "",
            [list(z) for z in zip(df_star.index.to_list(),df_star.to_list())]
        )
        .set_global_opts(legend_opts = opts.LegendOpts(is_show = False),title_opts=opts.TitleOpts(title="景点等级分布",subtitle="数据来源:去哪儿网\n制图:程序员晓晓Python",pos_top="0.5%",pos_left = 'left'))
        .set_series_opts(label_opts=opts.LabelOpts(formatter="{b}:{d}%",font_size=16))
    )
c.render_notebook()

df[df["star"]!='无'].sort_values("star",ascending=False)

The following are some of the selected 3A and above attractions:

summary

Through the above simple analysis, we can roughly get the following inspirations:

1. Xiamen is a typical coastal leisure city with rich marine and cultural landscapes;

2. Xiamen tourist attractions are mainly concentrated in Siming District, and are relatively scattered in other areas;

3. Xiamen University has the highest reputation, followed by Gulangyu Island;

4. Gulangyu Island ticket sales are far ahead of other attractions in Xiamen;

5. High-cost attractions or activities include yachts, sailing boats and Fantawild.

Interested friends will receive a complete set of Python learning materials, including interview questions, resume information, etc. See below for details.

1. Learning routes in all directions of Python

The technical points in all directions of Python have been compiled to form a summary of knowledge points in various fields. Its usefulness is that you can find corresponding learning resources according to the following knowledge points to ensure that you learn more comprehensively.

img
img

2. Essential development tools for Python

The tools have been organized for you, and you can get started directly after installation!img

3. Latest Python study notes

When I learn a certain basic and have my own understanding ability, I will read some books or handwritten notes compiled by my seniors. These notes record their understanding of some technical points in detail. These understandings are relatively unique and can be learned. to a different way of thinking.

img

4. Python video collection

Watch a comprehensive zero-based learning video. Watching videos is the fastest and most effective way to learn. It is easy to get started by following the teacher's ideas in the video, from basic to in-depth.

img

5. Practical cases

What you learn on paper is ultimately shallow. You must learn to type along with the video and practice it in order to apply what you have learned into practice. At this time, you can learn from some practical cases.

img

6. Interview Guide

Insert image description here

Insert image description here

resume templateInsert image description here
If there is any infringement, please contact us for deletion.

Guess you like

Origin blog.csdn.net/cxyxx12/article/details/135267468