What if I want audio data but can't download it in batches? Python teaches you a trick to solve~

foreword

Hello! Hello everyone, this is the Demon King~

Environment introduction

  • python 3.8
  • pycharm

module usage

  • requests >>> data request module third-party module pip install requests
  • re >>> Regular expression parsing data built-in module does not need to be installed

win + R Enter cmd Enter the installation command pip install If the module name is popular, it may be because the network connection timed out to switch the domestic mirror source

If you want to implement a crawler case, how would you do it?

Analysis: Analyze where the data we want comes from… audio url

Packet capture analysis through developer tools

  1. find the audio url address
  2. Find the audio data packet
    This data packet contains the audio url address we want
  3. Compare the change rules of data packet request parameters. Each audio has its corresponding audio ID.
  4. Search the ID through the developer tool to query the ID source... The source code of the web page has the audio ID we want

The purpose of writing the code is to get the audio url >>> audio data packet >>> get the audio ID

Code implementation steps: The crawler simulates the browser to send a request to the url address to obtain data

1. The first request, in order to obtain the audio ID and audio title

  1. send request, for audio catalog page send request
  2. Get data, get the server to return the response data
  3. Parse the data, extract the audio ID and audio title we want

2. The second request, in order to obtain the audio url address

  1. send request, for audio packets send request
  2. Get data, get the server to return the response data
  3. Parse data, extract audio url address

3. The third request, in order to obtain the audio binary data, so as to save

  1. save data

code

# import requests  # 数据请求模块 第三方模块 pip install requests
# """
# 发送请求
#     对于哪一个url地址发送请求, 发送什么样请求
#
# 爬虫发送请求你可以看做打电话的一个过程
# import requests 工具 手机
# url 电话号码
# headers 信号
# get 打电话的方式 座机是需要加区号 长沙0731
# <Response [200]> 打电话通了 200 嘟嘟嘟的声音  404 您所拨打电话是空号
# """
# for page in range(2, 17):
#     url = f'https://www.ximalaya.com/revision/album/v1/getTracksList?albumId=8625924&pageNum={
    
    page}&sort=0'  # 唯一资源定位符
#     # 模拟浏览器 headers请求头 字典数据类型, 构建成键值对形式
#     headers = {
    
    
#         'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/100.0.4896.75 Safari/537.36'
#     }
#     response = requests.get(url=url, headers=headers)
#     # print(response.text)  # <Response [200]> 响应对象 200 状态码表示请求成功 404 网址不对
#     #  re.findall() 调用re模块里面findall方法  从那些数据里面, 获取什么样的数据内容 \d+ 匹配一个数字或者多个数字
#     audio_info = response.json()['data']['tracks']
#     # print(audio_info)
#     for index in audio_info:
#         # 字符串格式化方法 {
    
    } 占位符
#         link = f'https://www.ximalaya.com/revision/play/v1/audio?id={
    
    index["trackId"]}&ptype=1'
#         # print(link)
#         # 获取响应对象json字典数据
#         json_data = requests.get(url=link, headers=headers).json()
#         audio_url = json_data['data']['src']
#         """
#         response.text     数据类型是字符串 响应对象文本数据
#         response.json()   数据类型是字典 响应对象json字典数据
#         response.content  二进制数据内容
#         ctrl + alt + L 格式化代码 和网易云收藏按键  以及QQ 锁定按键冲突
#         """
#         audio_content = requests.get(url=audio_url, headers=headers).content
#         with open('audio\\' + index['title'] + '.mp3', mode='wb') as f:
#             f.write(audio_content)
#         print(link, index['title'])


def get_num():
    lis = []
    for a in range(10):
        for b in range(10):
            for c in range(10):
                for d in range(10):
                    num = f'{
    
    a}{
    
    b}{
    
    c}{
    
    d}'
                    lis.append(num)
    return lis


print(get_num())

string = '1111122211'
string = string.replace('2', '3')
print(string)

epilogue

Well, this article of mine ends here!

If you have more suggestions or questions, feel free to comment or private message me! Let's work hard together (ง •_•)ง

Follow the blogger if you like it, or like and comment on my article! ! !

Guess you like

Origin blog.csdn.net/python56123/article/details/124174552