The world of adults is really not easy.
Sadness is always greater than joy.
Love is happy because of ignorance,
but it has entered into a complicated and confusing marriage.
foreword
I like to listen to emotional programs recently, for example, marriage programs, I may be old. I just thought about how to download the music and save it on my phone so that we can listen to it in our spare time.
environment use
- python 3.9
- pycharm
module use
- requests
module introduction
- requests
requests is a very practical Python HTTP client library. It is often used when crawlers and test servers respond to data. requests is a third-party library in Python language, which is specially used to send HTTP requests. It is much simpler to use than urllib.
- parcel
parsel is a python third-party library, which is equivalent to css selector + xpath + re.
Parsel is developed by the scrapy team. It extracts the parsel in scrapy independently. It can easily parse html and xml content and obtain the required data.
Compared with BeautifulSoup, xpath and parser are more efficient and easier to use.
- re
The re module is python's unique module for matching strings. Many functions provided in this module are implemented based on regular expressions, and regular expressions perform fuzzy matching on strings and extract the string parts you need. All languages are common.
- os
os is the abbreviation of "operating system". As the name suggests, the os module provides interfaces for various Python programs to interact with the operating system. By using the os module, on the one hand, it can easily interact with the operating system, and on the other hand, it can greatly enhance the portability of the code.
- csv
It is a file format, commonly known as a comma-separated value file, that can be opened with Excel software or a text document. The data fields are separated by half-width commas (other characters can also be used), and when opened with Excel, commas will be converted into separators. The csv file stores tabular data in plain text and is compatible with various operating systems.
Module installation problem:
- If installing python third-party modules:
win + R, enter cmd and click OK, enter the installation command pip install module name (pip install requests) and press Enter
Click Terminal (terminal) in pycharm to enter the installation command
- Reason for installation failure:
- Fail one: pip is not an internal command
Solution: set environment variable
- Failure 2: There are a lot of red reports (read time out)
Solution: Because the network link times out, you need to switch the mirror source
清华:https://pypi.tuna.tsinghua.edu.cn/simple 阿里云:https://mirrors.aliyun.com/pypi/simple/ 中国科技大学 https://pypi.mirrors.ustc.edu.cn/simple/ 华中理工大学:https://pypi.hustunique.com/ 山东理工大学:https://pypi.sdutlinux.org/ 豆瓣:https://pypi.douban.com/simple/ 例如:pip3 install -i https://pypi.doubanio.com/simple/ 模块名
- Failure three: cmd shows that it has been installed, or the installation is successful, but it still cannot be imported in pycharm
Solution: There may be multiple python versions installed (anaconda or python can only install one), just uninstall one, or the python interpreter in your pycharm is not set properly.
Code
send request
First of all, we need to determine our target URL, and we want to get the address of each audio.
We send a request to get the source code of the web page. We believe that everyone here will write the code.
url = 'https://www.ximalaya.com/album/37453303'
headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/111.0.0.0 Safari/537.36',
}
res = requests.get(url,headers=headers)
requests
is a built-in module of Python for sending HTTP requests. In this example, we use requests.get()
a function to https://www.ximalaya.com/album/37453303
send a GET request to the function and pass the request headers and response body as parameters to the function.
retrieve data
info_list = re.findall('"tracks":[(.*?)]', res.text)[1]
print(info_list)
re.findall()
function to find all matching substrings in a string. In this example, we used re.findall()
a function to find the string in the response body tracks
, and used a slice [1]
to get a list of substrings.
The first element in the substring list is tracks
the string we are looking for. We store it in info_list
a variable and use print()
the function output.
We found that the data we want is in the content we matched. It contains the title and id of each music, and we will get this next.
Note that here is not json data, so we can only match by regularity.
trackIds = re.findall('"trackId":(\d+)', info_list)
# print(trackIds)
titles = re.findall('"title":"(.*?)"', info_list)
# print(titles)
Analytical data
By comparing the url, we found that we only need to get the uid to directly access the audio. Not much to explain here.
audio = f'https://www.ximalaya.com/revision/play/v1/audio?id={trackId}&ptype=1'
We just need to replace the trackID and that's it. Request the above url to get the address of the audio. Next, we write the code.
for trackId, title in zip(trackIds, titles):
audio = f'https://www.ximalaya.com/revision/play/v1/audio?id={trackId}&ptype=1'
print(audio)
audio_res = requests.get(audio, headers=headers)
audio_url = audio_res.json()['data']['src']
print(audio_url)
zip()
function is used to combine two lists into one list. In this example, we use zip()
the function to combine trackIds
and titles
two lists into one list and store it in trackIds
a variable.
We then trackId
convert title
the sum to string format and store it in audio
a variable.
Next, we use requests.get()
the function to audio
send a GET request to , passing the request headers and response body as parameters to the function. Finally, we use json()
the method to convert the fields in the response body data
to JSON format, and use the method ['data']['src']
to fetch the value of src
the property, ie audio_url
, and store it in audio_url
a variable.
save data
music_content = requests.get(audio_url, headers=headers).content
with open('music//' + f'{title}' + '.mp3', mode='wb') as filename:
filename.write(music_content)
print(title, '保存成功')
Next, we request this webpage and save the binary locally. with open()
The statement is used to automatically close the file to ensure that the file is properly closed after use. In this example, we use the statement ++ with open()
to open a file named music//
++ and save it to a variable .title
.mp3
filename
We then use write()
the method to write the audio content to the file.
all codes
import os
import re
import requests
filename = 'music//'
if not os.path.exists(filename):
os.mkdir(filename)
url = 'https://www.ximalaya.com/album/37453303'
headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/111.0.0.0 Safari/537.36',
}
res = requests.get(url,headers=headers)
seoTrackList = re.findall('"seoTrackList":\[(.*?)\]', res.text)[0]
# print(seoTrackList)
trackIds = re.findall('"trackId":(\d+)',seoTrackList)
# print(trackIds)
titles = re.findall('"trackName":"(.*?)"',seoTrackList)
# print(titles)
for trackId,title in zip(trackIds,titles):
# audio = 'https://www.ximalaya.com/revision/play/v1/audio?id=621413024&ptype=1'
audio = f'https://www.ximalaya.com/revision/play/v1/audio?id={trackId}&ptype=1'
print(audio)
audio_res = requests.get(audio,headers=headers)
# print(audio_res)
audio_url = audio_res.json()['data']['src']
print(audio_url)
music_content = requests.get(audio_url,headers=headers).content
with open('music//' + f'{title}' + '.mp3', mode='wb') as filename:
filename.write(music_content)
print(title, '保存成功')
Summarize
This is a practical Python code on how to download music and save it to your phone. We first determine our target URL, then use requests.get()
a function to send a GET request to that URL, passing the request header and response body as parameters to the function.
After the request is successful, we use json()
the method to convert the fields in the response body data
to JSON format, and use the ['data']['src']
fetched src
property, ie audio_url
, the value, and store it in audio_url
a variable.
We request the web page, save the binary, and it's ready to go.