When I used Python to collect the detailed information of national franchise brands and performed visual analysis, I found these

My cousin came to me and said that he wanted to open a franchise store. He didn't know what brand was good, so he asked me to help him as a consultant.

Fortunately, I know Python, and I got the national franchise brand information in minutes, and after a little analysis, I selected the most suitable brand for him.

Without further ado, let's share the dry goods directly!

Preparation

development environment

  • python 3.8
  • pycharm
  • jupyter

module use

  • requests
  • parcel
  • csv
  • pandas
  • pyecharts

Third-party modules need to be installed manually, win + R, enter cmd, and enter the installation command pip install module name (if you think the installation speed is slow, you can switch to the domestic mirror source)

get data part

process steps

clear needs

Clear collection URL and data content
URL: http://www.******.com/brandList (replace winshangdata and delete brackets)
Data: Basic information of each brand

1. Packet capture analysis, the data can be obtained by requesting the link

  • Developer tools to capture packets --> F12 / FN+F12 / Right mouse click to detect and select network
  • refresh page
  • search data

2. How to get multiple data

For the link address of a single data packet, compare and check the rules
brandId --> brand ID changes
Can you find all brand IDs in a certain data packet --> capture packets on the list page

Code implementation steps

1. Send a request, send a request for the data packet of the brand ID

2. Get the data, get the server to return the response data

3. Analyze the data and extract the content we need ==> Brand ID

4. Send request, request data details page

5. Get the data, get the server to return the response data

6. Analyze the data and extract the content we need ==> basic brand information

7. Save the data, save the data to the table file

code display

module

# 导入数据请求模块 需要安装
import requests
# 导入数据解析模块
import parsel
# 导入csv模块
import csv

1. Send request

# 模拟浏览器 <请求头伪装>
headers = {
    
    
    # User-Agent 用户代理 表示浏览器基本身份信息
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/114.0.0.0 Safari/537.36'
}
for page in range(1, 11):
    print(f'---正在采集第{
      
      page}的数据内容---')
    # 请求链接
    url = 'http://www.****.com/wsapi/brand/list3_4'
    # 提交表单
    data = {
    
    
        "isHaveLink": "",
        "isTuozhan": "",
        "isXxPp": "",
        "kdfs": "",
        "key": "",
        "orderBy": "1",
        "pageNum": page,
        "pageSize": 60,
        "pid": "",
        "qy_p": "",
        "qy_r": "",
        "xqMj": "",
        "ytlb1": "",
        "ytlb2": ""
    }
    # 发送请求
    response = requests.post(url, json=data, headers=headers)

2. Get data

json_data = response.json()
print(ison_data)

3. Analyze data

json_data = response.json()
for index in json_data['data']['list']:
    link = f'http://www.***.com/brandDetail?brandId={
      
      index["brandId"]}'

4-5. Send request to get data

print(html_data)
html_data = requests.get(link, headers=headers).text
break

6. Analyze data

html_data = requests.get(link, headers=headers).text
selector = parsel.Selector(html_data)
title = selector.css('h1.detail-one-tit::text').get().strip()  # 品牌
company = selector.css('p.detail-company::text').get()  # 公司
info = selector.css('div.detail-three-tit::text').getall()
value = selector.css('span.detail-option-value::text').getall()
dit = {
    
    
    '品牌': title,
    '公司': company,
    '业态类别': info[0],
    '拓展状态': info[1],
    '创立时间': value[0],
    '人均消费/客单价': value[1].strip(),
    '开店方式': value[2],
    '合作期限': value[3],
    '面积要求': value[4],
    '已进购物中心': value[5],
    '详情页': link,
}
# 写入数据
csv_writer.writerow(dit)
print(dit)

7. Save the data to the table

f = open('品牌.csv', mode='w', encoding='utf-8', newline='')
csv_writer = csv.DictWriter(f, fieldnames=[
    '品牌',
    '公司',
    '业态类别',
    '拓展状态',
    '创立时间',
    '人均消费/客单价',
    '开店方式',
    '合作期限',
    '面积要求',
    '已进购物中心',
    '详情页',
])
csv_writer.writeheader()

data visualization section

read form

import pandas as pd

df = pd.read_csv('data.csv')
df.head()

Convert the data type of the 'entered mall' column to a string, use the str.replace() method to remove the 'home' character, and replace '–' with 0, and then convert the data type to an integer.

df['已进购物中心'] = df['已进购物中心'].astype(str).str.replace('家', '').str.replace('--', '0').astype(int)

By converting the data type of the 'per capita consumption/customer unit price' column to a string, use the str.replace() method to replace '–' with 0, and then use the str.split('-') method to split the string into list, and use the str.get(0) method to obtain the first element in the list (that is, the lowest consumption), and finally convert the data type to an integer.

df['人均消费'] = df['人均消费/客单价'].astype(str).str.replace('--', '0').str.split('-').str.get(0).astype(int)

By converting the data type of the 'per capita consumption/customer unit price' column to a string, use the str.replace() method to replace '–' with 0, and then use the str.split('-') method to split the string into list, and use the str.get(0) method to get the first element in the list (that is, the lowest consumption), and convert the data type to an integer.

top10 = df[['品牌', '已进购物中心', '人均消费']].sort_values('已进购物中心', ascending=False)[:10]
ShopList = list(top10['品牌'])
counts = list(top10['已进购物中心'])
price = list(top10['人均消费'])
print(ShopList)
print(counts)
print(price)

Use the entered shopping mall and per capita consumption as two series of data to make a histogram display

from pyecharts import options as opts
from pyecharts.charts import Bar

c = (
    Bar()
    .add_xaxis(ShopList)
    .add_yaxis("已进购物中心", counts)
    .add_yaxis("人均消费", price)
    .set_global_opts(
        xaxis_opts=opts.AxisOpts(axislabel_opts=opts.LabelOpts(rotate=-15)),
        title_opts=opts.TitleOpts(title="加盟品牌Top10", subtitle="已进购物中心"),
    )
)

per capita consumption in descending order

top10 = df[['品牌', '人均消费']].sort_values('人均消费', ascending=False)[:10]
ShopList = list(top10['品牌'])
price = list(top10['人均消费'])
from pyecharts import options as opts
from pyecharts.charts import Bar

c = (
    Bar()
    .add_xaxis(ShopList)
    .add_yaxis("人均消费", price)
    .set_global_opts(
        xaxis_opts=opts.AxisOpts(axislabel_opts=opts.LabelOpts(rotate=-15)),
        title_opts=opts.TitleOpts(title="加盟品牌Top10", subtitle="人均消费"),
    )
)
c.render_notebook()

Area Requirements Descending

df['面积要求'] = df['面积要求'].astype(str).str.replace('--', '0').str.split('-').str.get(0).astype(int)
top10 = df[['品牌', '面积要求']].sort_values('面积要求', ascending=False)[:10]
ShopList = list(top10['品牌'])
area = list(top10['面积要求'])
from pyecharts import options as opts
from pyecharts.charts import Bar

c = (
    Bar()
    .add_xaxis(ShopList)
    .add_yaxis("面积要求", area)
    .set_global_opts(
        xaxis_opts=opts.AxisOpts(axislabel_opts=opts.LabelOpts(rotate=-15)),
        title_opts=opts.TitleOpts(title="加盟品牌Top10", subtitle="面积要求"),
    )
)
c.render_notebook()

Area Requirements Line Chart

import pyecharts.options as opts
from pyecharts.charts import Line
from pyecharts.faker import Faker


c = (
    Line()
    .add_xaxis(ShopList)
    .add_yaxis("面积要求", area, is_connect_nones=True)
    .set_global_opts(title_opts=opts.TitleOpts(title="加盟品牌Top10-面积"))
#     .render("line_connect_null.html")
)
c.render_notebook()

Pie chart of how to open a store

from pyecharts import options as opts
from pyecharts.charts import Pie
from pyecharts.faker import Faker

c = (
    Pie()
    .add(
        "",
        [
            list(z)
            for z in zip(types, nums)
        ],
        center=["40%", "50%"],
    )
    .set_global_opts(
        title_opts=opts.TitleOpts(title="开店方式"),
        legend_opts=opts.LegendOpts(type_="scroll", pos_left="80%", orient="vertical"),
    )
    .set_series_opts(label_opts=opts.LabelOpts(formatter="{b}: {c}"))
#     .render("pie_scroll_legend.html")
)
c.render_notebook()

Well, that’s the end of sharing today’s article~

The article is not enough? I also prepared a video to explain in detail, and the business card at the end of the article is self-collected, and the remark [LL] passed the verification quickly.

Guess you like

Origin blog.csdn.net/ooowwq/article/details/132006664