Python crawler to explore the world of the original god 2 (character doujin)

I. Introduction

The last article talked about using the python crawler to open the door to the character article. Today we will take a look at the character fan map.

Second, page analysis

First, open Miyoushe to
Insert picture description here
find the best part in the fan map , and check and analyze the page

Insert picture description here
As the mouse slides on the picture page, there will be more json data on the right.
We need to use the data, the first is the request URL, in order to crawl more, we only use? Here, namely

Insert picture description here

https://bbs-api.mihoyo.com/post/wapi/forumGoodPostFullList?

After that, we need to look at other parameter changes.
Insert picture description here

last_id: Represents the position number of the last picture of this data relative to the entire page picture
page_size: Represents the total number of pictures in this data

Knowing this, you can crawl. For json data, the format is not convenient to view after conversion. It is recommended that you import pprintthis library to facilitate the view of json data. I have previously analyzed each step of extracting data in detail, and interested friends can take a look.

Portal

Three, the complete code

# -*- coding: UTF-8 -*-
"""
@Author  :远方的星
@Time   : 2021/3/8 19:25
@CSDN    :https://blog.csdn.net/qq_44921056
@腾讯云   : https://cloud.tencent.com/developer/column/91164
"""
import requests
import os
import json

# 创建一个文件夹
path = 'D:/原神同人画'
if not os.path.exists(path):
    os.mkdir(path)
# 构造一个请求头
header = {
    
    
    'User-Agent':'Mozilla/5.0 (Macintosh; Intel Mac OS X 11_1_0) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/87.0.4280.88 Safari/537.36'
}
# 用户输入想要爬取的页数
page = input('请输入您想要爬取的页数:')
page = int(page) + 1
n = 0
# image_id 代表图片从第几张开始爬取,默认一次滑动,加载出来20张
image_id = 0
for m in range(1, page):
    url = 'https://bbs-api.mihoyo.com/post/wapi/forumGoodPostFullList?'
    param = {
    
    
        'forum_id': '29',
        'gids': '2',
        'last_id': image_id,
        'page_size': '20'
    }
    # 定义一个空列表,用于存放图片的URL
    image_url = list()
    # 将编码形式转换为utf-8
    response = requests.get(url=url, headers=header, params=param)
    response.encoding = 'utf-8'
    response = response.text
    # 把字符串转换成json数据
    data_s = json.loads(response, strict=False)
    a = data_s["data"]["posts"]  # 提取data里的数据
    for i in range(len(a)):
        data = a[i].get("post").get("cover")
        image_url.append(data)
    for image_src in image_url:
        # 提取图片内容
        image_data = requests.get(url=image_src, headers=header).content
        # 图片名
        image_name = '{}'.format(n+1) + '.jpg'
        # 图片保存路径
        image_path = path + '/' + image_name
        # 保存数据
        with open(image_path, 'wb') as f:
            f.write(image_data)
            print(image_name, '==================>下载完毕!!!')
            f.close()
        n += 1
    image_id += 20

Fourth, the results show

Insert picture description here
Insert picture description here

I hope everyone can like, follow, collect, and support three consecutive times!

Author: distant star
CSDN: https: //blog.csdn.net/qq_44921056
Tencent cloud: https: //cloud.tencent.com/developer/column/91164
This article is only for the exchange of learning, without the author's permission is prohibited reprint , Don’t use it for other purposes, offenders must be investigated.

Guess you like

Origin blog.csdn.net/qq_44921056/article/details/114577739