[Python actual combat] Python collects emotional audio

The world of adults is really not easy.
Sadness is always greater than joy.
Love is happy because of ignorance,
but it has entered into a complicated and confusing marriage.

foreword

I like to listen to emotional programs recently, for example, marriage programs, I may be old. I just thought about how to download the music and save it on my phone so that we can listen to it in our spare time.

environment use

  • python 3.9
  • pycharm

module use

  • requests

module introduction

  • requests

        requests is a very practical Python HTTP client library. It is often used when crawlers and test servers respond to data. requests is a third-party library in Python language, which is specially used to send HTTP requests. It is much simpler to use than urllib.

  • parcel

        parsel is a python third-party library, which is equivalent to css selector + xpath + re.

Parsel is developed by the scrapy team. It extracts the parsel in scrapy independently. It can easily parse html and xml content and obtain the required data.

Compared with BeautifulSoup, xpath and parser are more efficient and easier to use.

  • re

        The re module is python's unique module for matching strings. Many functions provided in this module are implemented based on regular expressions, and regular expressions perform fuzzy matching on strings and extract the string parts you need. All languages ​​are common.

  • os

        os is the abbreviation of "operating system". As the name suggests, the os module provides interfaces for various Python programs to interact with the operating system. By using the os module, on the one hand, it can easily interact with the operating system, and on the other hand, it can greatly enhance the portability of the code.

  • csv

        It is a file format, commonly known as a comma-separated value file, that can be opened with Excel software or a text document. The data fields are separated by half-width commas (other characters can also be used), and when opened with Excel, commas will be converted into separators. The csv file stores tabular data in plain text and is compatible with various operating systems.

Module installation problem:

  • If installing python third-party modules:

win + R, enter cmd and click OK, enter the installation command pip install module name (pip install requests) and press Enter

Click Terminal (terminal) in pycharm to enter the installation command

  • Reason for installation failure:

  • Fail one: pip is not an internal command

                Solution: set environment variable

  • Failure 2: There are a lot of red reports (read time out)

                Solution: Because the network link times out, you need to switch the mirror source

   

    清华:https://pypi.tuna.tsinghua.edu.cn/simple
    阿里云:https://mirrors.aliyun.com/pypi/simple/
    中国科技大学 https://pypi.mirrors.ustc.edu.cn/simple/
    华中理工大学:https://pypi.hustunique.com/
    山东理工大学:https://pypi.sdutlinux.org/
    豆瓣:https://pypi.douban.com/simple/
    例如:pip3 install -i https://pypi.doubanio.com/simple/ 模块名
  • Failure three: cmd shows that it has been installed, or the installation is successful, but it still cannot be imported in pycharm

                Solution: There may be multiple python versions installed (anaconda or python can only install one), just uninstall one, or the python interpreter in your pycharm is not set properly.

Code

send request

First of all, we need to determine our target URL, and we want to get the address of each audio.

We send a request to get the source code of the web page. We believe that everyone here will write the code.

url = 'https://www.ximalaya.com/album/37453303'

headers = {
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/111.0.0.0 Safari/537.36',
}

res = requests.get(url,headers=headers)

requests is a built-in module of Python for sending HTTP requests. In this example, we use  requests.get() a function to  https://www.ximalaya.com/album/37453303 send a GET request to the function and pass the request headers and response body as parameters to the function.

retrieve data

info_list = re.findall('"tracks":[(.*?)]', res.text)[1]
print(info_list)

re.findall() function to find all matching substrings in a string. In this example, we used  re.findall() a function to find the string in the response body  tracks , and used a slice  [1] to get a list of substrings.

The first element in the substring list is  tracks the string we are looking for. We store it in  info_list a variable and use  print() the function output.

We found that the data we want is in the content we matched. It contains the title and id of each music, and we will get this next.

Note that here is not json data, so we can only match by regularity.

trackIds = re.findall('"trackId":(\d+)', info_list)
# print(trackIds)
titles = re.findall('"title":"(.*?)"', info_list)
# print(titles)

Analytical data

By comparing the url, we found that we only need to get the uid to directly access the audio. Not much to explain here.

audio = f'https://www.ximalaya.com/revision/play/v1/audio?id={trackId}&ptype=1'

We just need to replace the trackID and that's it. Request the above url to get the address of the audio. Next, we write the code.

for trackId, title in zip(trackIds, titles):
    audio = f'https://www.ximalaya.com/revision/play/v1/audio?id={trackId}&ptype=1'
    print(audio)
    audio_res = requests.get(audio, headers=headers)
    audio_url = audio_res.json()['data']['src']
    print(audio_url)

zip() function is used to combine two lists into one list. In this example, we use  zip() the function to  combine trackIds and  titles two lists into one list and store it in  trackIds a variable.

We then  trackId convert  title the sum to string format and store it in  audio a variable.

Next, we use  requests.get() the function to  audio send a GET request to , passing the request headers and response body as parameters to the function. Finally, we use  json() the method to convert the fields in the response body  data to JSON format, and use the method  ['data']['src'] to fetch the value of  src the property, ie  audio_url , and store it in  audio_url a variable.

 save data

music_content = requests.get(audio_url, headers=headers).content

with open('music//' + f'{title}' + '.mp3', mode='wb') as filename:
    filename.write(music_content)
    print(title, '保存成功')

Next, we request this webpage and save the binary locally. with open() The statement is used to automatically close the file to ensure that the file is properly closed after use. In this example, we use  the statement  ++ with open() to open a file named  music// ++  and save it to a variable .title.mp3filename

We then use  write() the method to write the audio content to the file.

all codes

import os
import re
import requests

filename = 'music//'
if not os.path.exists(filename):
    os.mkdir(filename)

url = 'https://www.ximalaya.com/album/37453303'
headers = {
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/111.0.0.0 Safari/537.36',
}
res = requests.get(url,headers=headers)
seoTrackList = re.findall('"seoTrackList":\[(.*?)\]', res.text)[0]
# print(seoTrackList)
trackIds = re.findall('"trackId":(\d+)',seoTrackList)
# print(trackIds)
titles = re.findall('"trackName":"(.*?)"',seoTrackList)
# print(titles)
for trackId,title in zip(trackIds,titles):
    # audio = 'https://www.ximalaya.com/revision/play/v1/audio?id=621413024&ptype=1'
    audio = f'https://www.ximalaya.com/revision/play/v1/audio?id={trackId}&ptype=1'
    print(audio)
    audio_res = requests.get(audio,headers=headers)
    # print(audio_res)
    audio_url = audio_res.json()['data']['src']
    print(audio_url)

    music_content = requests.get(audio_url,headers=headers).content
    with open('music//' + f'{title}' + '.mp3', mode='wb') as filename:
        filename.write(music_content)
        print(title, '保存成功')

Summarize

This is a practical Python code on how to download music and save it to your phone. We first determine our target URL, then use requests.get()a function to send a GET request to that URL, passing the request header and response body as parameters to the function.

After the request is successful, we use json()the method to convert the fields in the response body  data to JSON format, and use the ['data']['src']fetched  src property, ie  audio_url , the value, and store it in  audio_url a variable.

We request the web page, save the binary, and it's ready to go.

6adf31c8c5dd4e6a83314f4805b30bc1.jpg

Guess you like

Origin blog.csdn.net/BROKEN__Y/article/details/131093206