Python batch crawls the music of Chinese Uranus superstar Jay Chou

Preface

The text and pictures in this article are from the Internet and are for learning and communication purposes only. They do not have any commercial use. If you have any questions, please contact us for processing.

PS: If you need Python learning materials, you can click on the link below to get it yourself

Python free learning materials and group communication answers Click to join

The little friend said that he wanted to listen to Jay Chou’s music. There are websites where he can listen to Jay Chou’s music for free. Then he found that Migu Music can listen to Jay Chou’s songs for free. Since he can listen to Jay Chou’s songs for free, wouldn’t he be able to climb~

Basic development environment

  • Python 3.6
  • Pycharm
import requests
import parsel

Related modules pipcan be installed

Insert picture description here

Target website analysis

Insert picture description here
Click the play button, it will automatically jump to the music playback page

Insert picture description here
There is a download button on the playback interface, click to download.

Need to login account

Insert picture description here

  • Open developer tools
  • Select network
  • Click to download now

There will be a download data interface, a post request data interface, and the data returned in it carries the real audio address.
Insert picture description here
Insert picture description here
Insert picture description here
Copy the url address, it will automatically download the file to the local

Since it is a post request, just look at changes in data parameters to see if it needs to pass those parameters
Insert picture description here
multi-view download a few songs hydrogen, can be found copyrightIdis the ID value of each piece of music, you only need to get the ID value of each song, you You can download music.

So back to Jay Chou’s music list page

Insert picture description here
It can be found that the music list page is a static website, and you can directly use requests to request the website to parse the website data, and you can obtain the ID value and title of the music.

Now there is the last question left, that is, page turning, multi-page access.

For page-turning crawling, you only need to click on the next page, check the change of the url address, and find the corresponding change rule.

Insert picture description here
page is the corresponding page number, so the page turning and crawling are also done, the next step is to write the code

1. Request the webpage to get the ID value and title of the music

I won’t bring the cookie. You can log in to Migu Music and copy the one in the developer tool.

ef get_mp3_info(url):
    headers = {
    
    
        'cookie': '',
        'user-agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/81.0.4044.138 Safari/537.36'
    }
    response = requests.get(url=url, headers=headers)
    selector = parsel.Selector(response.text)
    lis = selector.css('#J_PageSonglist > div.songlist-body > div')
    for li in lis:
        page_url = li.css('.song-name-txt::attr(href)').get()
        mp3_id = page_url.split('/')[-1]
        title = li.css('.song-name-txt::attr(title)').get()

2. Post request to get music download address

Here you don’t need to write so many headers parameters, just copy and paste them directly for convenience, because it is a post request, some parameters are necessary, otherwise the desired return result will be obtained.

def get_mp3_url(mp3_id, title):
    url = 'https://music.migu.cn/v3/api/order/download'
    headers = {
    
    
        'authority': 'music.migu.cn',
        'method': 'POST',
        'path': '/v3/api/order/download',
        'scheme': 'https',
        'accept': '*/*',
        'accept-encoding': 'gzip, deflate, br',
        'accept-language': 'zh-CN,zh;q=0.9',
        'cache-control': 'no-cache',
        'content-length': '42',
        'content-type': 'application/x-www-form-urlencoded; charset=UTF-8',
        'cookie': '',
        'origin': 'https://music.migu.cn',
        'pragma': 'no-cache',
        'referer': 'https://music.migu.cn/v3/music/order/download/60054701923',
        'sec-fetch-dest': 'empty',
        'sec-fetch-mode': 'cors',
        'sec-fetch-site': 'same-origin',
        'user-agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/81.0.4044.138 Safari/537.36',
        'x-requested-with': 'XMLHttpRequest',
    }

    data = {
    
    
        'copyrightId': '{}'.format(mp3_id),
        'payType': '01',
        'type': '1'
    }
    response = requests.post(url=url, data=data, headers=headers)
    html_data = response.json()
    mp3_url = html_data['downUrl']

3. Save music to local

Saving the code is relatively simple and commonly used with open

def download(download_url, title):
    response = requests.get(url=download_url)
    path = '音乐\\' + title + '.mp3'
    with open(path, mode='wb') as f:
        f.write(response.content)

Specific effect

Part of the music still needs to be paid, so when you post to request paid music, there is no download address, you can write a judgment
Insert picture description here
Insert picture description here

to sum up

The code can be optimized. This is just the simplest version of the crawler code. The download speed is not very fast. You can use multi-threaded crawling, which is faster. You can optimize it yourself.

Crawlers are not difficult, mainly to analyze websites, unless it involves heavily encrypted websites, such as font encryption and JS data encryption. Most of these basic websites only need to think about analyzing the data of the website before they can be crawled.

It’s actually not difficult to encrypt fonts, mainly because the crawling process is a bit complicated. Come on

Guess you like

Origin blog.csdn.net/fei347795790/article/details/109742327