Python QQ music crawling

Python QQ music crawling

Yes, as you can see, this is an article about Python crawlers. I hope you clarify the following points before looking at the code.
1. The website crawled by this crawler is the official website of qq music on the PC side.
2. The crawled music is only officially allowed for us to listen to, and does not involve paid music such as Vip.
3. The code and explanation are oriented to those who have a certain python crawler practical basis. Partner
4. This code was actually written a long time ago, but you can still climb to the song, and I won’t know it later.
5. This article and the code in it are only used for teaching and communication and do not profit from it.

After nagging so much, I went directly to the code to explain

import requests
import json
from lxml import etree
import os
#导入要用到的包和模块

I believe you who read this article have experience in crawling. Nevertheless, I still recommend using the xpath in requests and lxml to obtain web page data, and use Google Chrome for packet capture.

"""这次我们从程序最开始讲起"""
if __name__=='__main__':
"""添加headers,一般其中含有User-Agent就足够,不够的话自己试着添加Cookie、Referer和其他
请按照自己电脑的User-Agent自行更改"""
    headers={
    
    
        'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) \
    Chrome/70.0.3538.102 Safari/537.36 Edge/18.18362'
        }
"""一个死循环让你可以一直输入歌手名进行爬歌"""     
    while(True):
        singer_name=input("请输入歌手名:")
        singer_id=get_singer_id(singer_name)
        qqmusicOfsinger(singer_id)
        print('爬取成功!')

Next, I will tell you how get_singer_id() and qqmusicOfsinger() are written

"""看到这个 url 是不是有点可怕,按道理应该是可以精简的,不过我懒嘻嘻
这个url的获得是需要一点抓包经验的,你可以输入一个歌手试试,然后跟踪着找到含有其名字的那些网址
(xxxxxx)的部分其实我的QQ号,因为是qq音乐嘛,实际上不登录也可以的
"""
def get_singer_id(singer_name):
    url='https://c.y.qq.com/soso/fcgi-bin/client_search_cp?ct=24&qqmusic_ver=1298&new_json=1&remoteplace=txt.yqq.song&searchid=62984089150681434&t=0&aggr=1&cr=1&catZhida=1&lossless=0&flag_qc=0&p=1&n=10&w={}&g_tk=522847069&loginUin=(xxxxxxxxxx)&hostUin=0&format=json&inCharset=utf8&outCharset=utf-8&notice=0&platform=yqq.json&needNewCode=0'.format(singer_name)
    response=requests.get(url,headers=headers)#获取歌手的id
    data=json.loads(response.text) #转化成字典形式便于找到id
    singer_id=data['data']['zhida']['zhida_singer']['singerMID']
    return singer_id
    

The code behind is a bit long and looks unoptimized. In fact, it is intentional to reduce the logic errors of the code. Calling more functions may not be a good thing, and it may make your code logic very messy.

def qqmusicOfsinger(singer_id):
"""根据歌手id访问qq音乐网站"""
    start_url='https://y.qq.com/n/yqq/singer/{}.html'.format(singer_id)
    response = requests.get(start_url,headers=headers)
    html_ele=etree.HTML(response.text)#获取源代码数据
    #获取目录下的歌曲id
    song_hrefs=html_ele.xpath('//span[@class="songlist__songname_txt"]/a/@href')
    #获取歌曲名,以便于保存
    song_names=html_ele.xpath('//span[@class="songlist__songname_txt"]/a/@title')
"""上面这些其实都已经是套路来的了,记下这个xpath解析网站基本就够用了"""
    song_ids=[]
    for href in song_hrefs:
        id=href.split('/')[-1].split('.')[0]#提取歌曲id
        song_ids.append(id) 
    #构造能找到源音频的网址
    #记录歌名的编号
    index=0
"""确实说实话,抓包获取这些url真的是一个漫长的过程,别看代码好像都不长,其实背后付出的汗水和时间是真的多,向那些默默无闻的工作者敬个礼"""
    for song_id in song_ids:
        data_url='https://u.y.qq.com/cgi-bin/musicu.fcg?&data=%7B%22req%22%3A%7B%22module%22%3A%22CDN.SrfCdnDispatchServer%22%2C%22method%22%3A%22GetCdnDispatch%22%2C%22param%22%3A%7B%22guid%22%3A%227157685530%22%2C%22calltype%22%3A0%2C%22userip%22%3A%22%22%7D%7D%2C%22req_0%22%3A%7B%22module%22%3A%22vkey.GetVkeyServer%22%2C%22method%22%3A%22CgiGetVkey%22%2C%22param%22%3A%7B%22guid%22%3A%227157685530%22%2C%22songmid%22%3A%5B%22{}%22%5D%2C%22songtype%22%3A%5B0%5D%2C%22uin%22%3A%221334724312%22%2C%22loginflag%22%3A1%2C%22platform%22%3A%2220%22%7D%7D%2C%22comm%22%3A%7B%22uin%22%3A1334724312%2C%22format%22%3A%22json%22%2C%22ct%22%3A24%2C%22cv%22%3A0%7D%7D'\
                  .format(song_id)
       
        res=requests.get(data_url,headers=headers)
        data=res.json()
        #pprint(data) #这里就可以看出我打码的时候也不是一气呵成的,是不断从获得数据中探寻的
        song_url=data['req_0']['data']['midurlinfo'][0]['purl']
        #源音频的网址的拼接
        song_url='http://ws.stream.qqmusic.qq.com/'+song_url
       """终于到激动人心的歌曲下载了"""
        result=requests.get(song_url,headers=headers)
        song_name=song_names[index]
        os.makedirs('qqmusic',exist_ok=True)#我们在代码的同目录下新建一个qqmusic的文件夹用于存歌曲
        with open("qqmusic\{}.m4a".format(song_name),"wb")as f:#要点,这些歌曲音频等数据一定要用wb
            f.write(result.content)#下载内容
        print(song_name+'下载成功')
        index+=1

At this point, the work is done. Later, you can use pyinstaller to package the code into an exe file and share it with your friends. At the beginning, you didn’t know the benefits of packaging. Now you know, the other party’s computer can run without python. Is it very convenient? .
If you really want a prostitute, you can visit the packaged exe resource I uploaded. If you have any questions, please ask.

I hope you can gain something. If you like it, please like it, pay attention to it, and it will be updated later, bye.

Guess you like

Origin blog.csdn.net/weixin_43594279/article/details/105800752