40 lines of code implement cool music crawler with python

Not much to say directly to the code:

import requests, os


class Spider:
    def __init__(self):
        self.singer_name = input('请输入要爬取的歌手名:')
        self.pages = int(input('请输入爬取页数(一页30首歌):'))
        os.mkdir('{}'.format(self.singer_name))
        self.headers = {'Accept': ',application/json, text/plain, */*',#请求头信息
                        'Accept-Encoding': 'gzip, deflate',
                        'Accept-Language': 'zh-CN,zh;q=0.9',
                        'Connection': 'keep-alive',
                        'Cookie': '_ga=GA1.2.1637941648.1616934252; uname3=qq1616934321; t3kwid=131286315; websid=1488073791; pic3=""; t3=qq; Hm_lvt_cdb524f42f0ce19b169a8071123a4797=1617949101,1618127723,1618579672,1619099581; _gid=GA1.2.1505163314.1619099581; Hm_lpvt_cdb524f42f0ce19b169a8071123a4797=1619100738; _gat=1; kw_token=XM5GXCP8M5',
                        'csrf': 'XM5GXCP8M5',
                        'Host': 'www.kuwo.cn',
                        'Referer': 'http://www.kuwo.cn/search/list?key=%E5%91%A8%E6%9D%B0%E4%BC%A6',
                        'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.25 Safari/537.36 Core/1.70.3861.400 QQBrowser/10.7.4313.400'}

    def main(self):
        for page in range(self.pages):
            print('正在爬取第{}页的歌曲!'.format(page + 1))#不断改变爬取的页数
            url = 'http://www.kuwo.cn/api/www/search/searchMusicBykeyWord?key={}&pn={}&rn=30&httpsStatus=1&reqId=b4274401-a377-11eb-a99d-ef0323beeee3'.format(
                self.singer_name, page + 1)
            response = requests.get(url, headers=self.headers)
            json = response.json()#得到储存歌曲信息的json文件,下面是层层解析获取name和rid
            data = json['data']
            song_list = data['list']
            for song in song_list:
                song_name = song['name']
                song_rid = song['rid']
                song_json_url = 'http://www.kuwo.cn/url?format=mp3&rid={}&response=url&type=convert_url3&br=128kmp3&from=web&t=1619102008389&httpsStatus=1&reqId=b4280751-a377-11eb-a99d-ef0323beeee3'.format(
                    song_rid)#不断改变rid以获取不同歌取的mp3地址
                print('正在爬取{}。。。'.format(song_name))
                song_url = requests.get(song_json_url, headers=self.headers).json()['url']#请求歌曲的mp3地址,将响应以二进制文件储存到本地
                with open('{}/{}.mp3'.format(self.singer_name, song_name), 'wb') as wstream:
                    wstream.write(requests.get(song_url).content)
                print('爬取成功!')
if __name__ == '__main__':
    music=Spider()
    music.main()
    os.system('pause')

First go to Kuwo Music official website: http://www.kuwo.cn/

Enter the name of the singer to be crawled in the search bar:

 Come to this page and right click to check:

 Click in order as shown in the figure below:

 You can see that the response here is a file in json format. Next, let's parse the json

Copy this response response into a json file for formatting, and then expand in sequence:

It is found that all the song information of this page is stored in this 'list', and then continue to expand one of the songs:

 Here you can find the song's rid and song name. These two information are useful. The rid is used to help us locate the mp3 file where the song is located. The name is used to name the song when downloading the song.

Go back to the previous page:

It is found that the response obtained by this url contains the playback address of the song (that is, the url of the mp3 file), you can directly copy the song url to the address bar to request, and you will go to the playback interface of a song:

Then changing different rids will naturally get different song urls, and then we directly request these urls, and the response we get is this mp3 file, which can be opened and played by any player after being stored locally in binary form, and you can also Package your program into an exe file and send it to your friends for use.

Here we request a total of three urls:

One: store the current page song information (rid and name, etc.)

Two: Store a certain ID corresponding to the song url playback address

Three: the song itself

When we request, we must add the request header or we will not get the information:

Save the request headers here in a dictionary in the code (if you are afraid of trouble, you can use the author's).

The following key corresponds to the keyword you searched, pn represents the number of pages, and rn is the number of songs on each page. We can modify it independently to obtain the songs and the number of songs of different singers:

 

The overall idea is relatively clear: request the url for storing song information (this url can change the search keywords and page numbers to obtain songs and the number of different singers), request for storing the url of the song playback address (this url can be changed by changing the rid Value to get different song playback address), request the address of the song mp3 file (get binary response and save it locally).

Need to master the following skills: python basic syntax and basic data types, simple use of request library.

This is the author's first post, and I hope it will be helpful to friends who have learned some python basics, and it can be used as the first small project in the field of reptiles. Friends who have questions are welcome to leave a message in the comment area, I hope everyone supports!

                                                                                       If there are any mistakes or omissions, thank you for contacting and correcting.

                                                                                       If there is any infringement, please contact to delete.

Guess you like

Origin blog.csdn.net/m0_52726759/article/details/119145600