Article directory
Preamble
As a contemporary new youth, should you be able to make short videos more or less?
Haha, that contemporary self-media creator is good~
How often do you need some funny sounds when making a video? Or strange sounds? music wait~
How slow is the download one by one, we will use python to achieve batch download today~
environment/module/target
1. Goals
2. Development environment
Brothers, if you are just learning Python, don't install some other software, just install these two~
Python 环境
Pycharm 编辑器
3. Module
The modules used this time are mainly these two
requests # 数据请求模块
re # 正则表达式模块
Process explanation
This time, I wrote the process in detail, the kind that Xiaobai can understand. After reading it, everyone remembers the three-link, give me a little motivation to create, hehe~
First, we open the website, right-click, select check
, select network, refresh the page and scroll down, A page of page-4 and page-5 will appear.
A lot of data on these two pages is directly available here. Let’s find any one and click to play, then click media. There will be an audio file in the headers, which is the download address I marked.
You can play it directly or download
it directly. What if you want to get this address?
We directly copy this string of numbers, such as 32716, and then click the search box in the upper left corner to search.
After searching, we can see that page-5 has the audio link address here.
Audio headers can also be found here.
Then we click on headers and send a request directly to this url address.
First import the requests module
import requests
url is the link just now
url = 'https://手动替换一下/search/word-/page-5'
Then we add a header to disguise
Just copy the contents of the user-agent under the headers here, just
remember to add quotation marks
headers = {
'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/98.0.4758.102 Safari/537.36'
}
Then send the request, print it to see the result
response = requests.get(url=url, headers=headers)
print(response.text)
There is too much printed content. We search for MP3 directly on it, and locate it precisely. Its title is in the link below the mp3 file.
Then we copy it over, use the regular to match the middle content, and replace the middle url with (.*?).
First import the re module
import re
Just copied the content over, .*? Surround it with parentheses.
To match from response.text, the matched content is received by the variable play_url_list.
play_url_list = re.findall('<div class="ui360 ui360-vis"><a href="(.*?)"></a></div>', response.text)
Then print it to see if there is a match
print(play_url_list)
You can see that the mp3 file is directly matched, and it is included in a list.
Then we also need its title name, and copy it over as well. Or the same operation, replace the url and name with .*?
To match from response.text, the matched content is received by the variable name_list.
name_list = re.findall('<a class="h6 text-white font-weight-bold" target="_blank" href=".*?" title="(.*?)">.*?</a>', response.text)
print it
print(name_list)
You can see that the data of the name has been obtained.
Traverse it, pack the obtained data together, extract them one by one, get a binary data content of it, and use the variable mp3_content to receive it
for play_url, name in zip(play_url_list, name_list):
mp3_content = requests.get(url=play_url, headers=headers).content
Then save it directly, with open give it the name of a folder, add the name, add the suffix of .mp3, save mode = wb , use the variable f.write to receive mp3_content
with open('音效\\' + name + '.mp3', mode='wb') as f:
f.write(mp3_content)
We have not written to automatically create a folder here, so you need to manually create a folder, and then write the name you named into it.
Then we print it and see the result.
print(name)
The relevant data content is saved in the folder you created
Note: You can manually replace all the urls by yourself. I will delete them here, otherwise they will be killed by mistake.
all code
import requests
import re
url = 'https://这里大家自己替换一下/search/word-/page-5'
headers = {
'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/98.0.4758.102 Safari/537.36'
}
response = requests.get(url=url, headers=headers)
# print(response.text)
play_url_list = re.findall('<div class="ui360 ui360-vis"><a href="(.*?)"></a></div>', response.text)
name_list = re.findall('<a class="h6 text-white font-weight-bold" target="_blank" href=".*?" title="(.*?)">.*?</a>', response.text)
print(play_url_list)
print(name_list)
for play_url, name in zip(play_url_list, name_list):
mp3_content = requests.get(url=play_url, headers=headers).content
with open('音效\\' + name + '.mp3', mode='wb') as f:
f.write(mp3_content)
print(name)
Brothers, today's sharing is here, slip away~
Remember to like and favorite, give me motivation~