Python crawling Netease cloud music search and download songs!
Article Directory
1. Preparations
NetEase cloud music I tried it and found that it is a dynamic page, the contents of which are JS generated, so not very good crawling. This time there should be a third-party website, "help" us crawling up.
I found a third-party software , you can use it to climb out of the song ID, we are crawling its source code, taken out ID (seems a bit convoluted)
2. "in the field" observation
We enter into this site and found this site a 5 download the source can be searched:
Today our goal is to download the song Netease cloud music, interested junior partner may try to crawl songs to other sites, the principle is the same. We just search for a song, see the Web site.
We noticed that the URL "kw =" representatives are behind the name of the song, while "lx =" behind represents the download source.
Let us look at the source code:
We see a positive have a label inside what we want: the download link and song name.
With song titles and download links can be easily handled, the next part of the code is the code!
3. Start Code Code!
I'm here to do a user interface and UI, as well as a new way to download: to download. Want to see the link to download a small partner can skip this chapter, see Chapter 4: search and download.
Link to download
First of all, we have to know a URL: http:?? //Music.163.com/song/media/outer/url id = .mp3
What is it? This is a download link, fill in the song "id =" at the ID can be downloaded.
We just opened a music and found exactly "id = ???" such a format, we just use regular expressions to extract the ID, and then fill in the ID to the above URL on it on the web site, the code:
import re
import urllib.request
import tkinter.messagebox as box
# 设置下载函数
def urldownload():
url = lefturl.get() # 这里是我UI的输入框,不想用UI的可以直接input
try:
# 解析歌曲id
urlid = re.findall('id=(.*)', url)[0]
# 获取下载网页
durl = 'http://music.163.com/song/media/outer/url?id=%s.mp3' % urlid
# 下载歌曲
urllib.request.urlretrieve(durl, '绝对路径\名称.mp3')
# 提示下载完毕
box.showinfo(title='提示', message='音乐已下载完毕!\n已保存至download文件夹!')
except:
box.showerror(title='错误', message='下载链接错误!')
4. Search and Download
Want to get the download link and name, we first have to get the source code of the page:
# 搜索函数
def searchdownload(name):
# 从网站的Requests Header中获取
url = 'https://music.hwkxk.cn/?kw=%s&lx=wy' % name
html = requests.get(url=url).text
print(html)
But after the operation, the output is garbled, how is this going?
At this time, we can put the page contents to a single-byte encoding, then into UTF-8, modified as follows:
import requests
# 搜索函数
def searchdownload(name):
# 从网站的Requests Header中获取
url = 'https://music.hwkxk.cn/?kw=%s&lx=wy' % name
html = requests.get(url=url).text
html = html.encode('ISO-8859-1')
html = html.decode('UTF-8')
print(html)
At this time, there is no distortion.
Next, came crawling song titles and download links:
We see that the name of a song class label is "btn btn-xs btn-success ", but that's just a song of class, we need to find "all the songs the class ".
We see on the right "styles", found that the class is "class all a label."
Now code Code:
import bs4
import requests
# 搜索函数
def searchdownload(name):
# 从网站的Requests Header中获取
url = 'https://music.hwkxk.cn/?kw=%s&lx=wy' % name
html = requests.get(url=url).text
html = html.encode('ISO-8859-1')
html = html.decode('UTF-8')
# 解析网页
soup = bs4.BeautifulSoup(html, "lxml")
# 查找目标
link_0 = soup.select('.btn-success')
print(link_0)
After running the function, Python returns a list:
[<a class="btn btn-xs btn-success" download="久石譲 - Summer.flac" href="https://music.hwkxk.cn/api/?source=WYSQ&id=1417064063" target="_blank">无损</a>, <a class="btn btn-xs btn-success" download="久石譲 - Summer.flac" href="https://music.hwkxk.cn/api/?source=WYSQ&id=443242" target="_blank">无损</a>, <a class="btn btn-xs btn-success" download="keshi - summer.flac" href="https://music.hwkxk.cn/api/?source=WYSQ&id=1378192821" target="_blank">无损</a>, <a class="btn btn-xs btn-success" download="Calvin Harris - Summer.flac" href="https://music.hwkxk.cn/api/?source=WYSQ&id=28306554" target="_blank">无损</a>, <a class="btn btn-xs btn-success" download="Mazza - Summer Klaas Remix.flac" href="https://music.hwkxk.cn/api/?source=WYSQ&id=28729445" target="_blank">无损</a>, <a class="btn btn-xs btn-success" download="久石譲 - Summer.flac" href="https://music.hwkxk.cn/api/?source=WYSQ&id=444292" target="_blank">无损</a>, <a class="btn btn-xs btn-success" download="keshi - summer.flac" href="https://music.hwkxk.cn/api/?source=WYSQ&id=1361455890" target="_blank">无损</a>, <a class="btn btn-xs btn-success" download="徐梦圆 - summer.flac" href="https://music.hwkxk.cn/api/?source=WYSQ&id=34779102" target="_blank">无损</a>, <a class="btn btn-xs btn-success" download="LJY - Summer (夏).flac" href="https://music.hwkxk.cn/api/?source=WYSQ&id=485263993" target="_blank">无损</a>, <a class="btn btn-xs btn-success" download="Calvin Harris - Summer.flac" href="https://music.hwkxk.cn/api/?source=WYSQ&id=29460066" target="_blank">无损</a>, <a class="btn btn-xs btn-success" download="戈冧 - Summer.flac" href="https://music.hwkxk.cn/api/?source=WYSQ&id=1377103256" target="_blank">无损</a>, <a class="btn btn-xs btn-success" download="David Garrett - Summer.flac" href="https://music.hwkxk.cn/api/?source=WYSQ&id=17241229" target="_blank">无损</a>, <a class="btn btn-xs btn-success" download="cozy kev - summer.flac" href="https://music.hwkxk.cn/api/?source=WYSQ&id=1410153419" target="_blank">无损</a>, <a class="btn btn-xs btn-success" download="KMS - summer.flac" href="https://music.hwkxk.cn/api/?source=WYSQ&id=1418582038" target="_blank">无损</a>, <a class="btn btn-xs btn-success" download="Yogee New Waves - Summer.flac" href="https://music.hwkxk.cn/api/?source=WYSQ&id=29979351" target="_blank">无损</a>, <a class="btn btn-xs btn-success" download="Marshmello - SuMmeR.flac" href="https://music.hwkxk.cn/api/?source=WYSQ&id=39324020" target="_blank">无损</a>, <a class="btn btn-xs btn-success" download="Kesha - Summer.flac" href="https://music.hwkxk.cn/api/?source=WYSQ&id=1419676441" target="_blank">无损</a>, <a class="btn btn-xs btn-success" download="Calvin Harris - Summer R3hab Ummet Ozcan Remix.flac" href="https://music.hwkxk.cn/api/?source=WYSQ&id=28696074" target="_blank">无损</a>, <a class="btn btn-xs btn-success" download="BROCKHAMPTON - SUMMER.flac" href="https://music.hwkxk.cn/api/?source=WYSQ&id=502242134" target="_blank">无损</a>, <a class="btn btn-xs btn-success" download="Dan Martinez - 夏日狂欢.flac" href="https://music.hwkxk.cn/api/?source=WYSQ&id=1320098269" target="_blank">无损</a>]
This is what we want, first we are now printing the list:
print(link_0[0])
Output:
<a class="btn btn-xs btn-success" download="久石譲 - Summer.flac" href="https://music.hwkxk.cn/api/?source=WYSQ&id=1417064063" target="_blank">无损</a>
A closer observation, found that "download" is the name of the song, "href" is a song download link. Just behind "link_0 [0]" plus ".get ( 'href') [0]" on it, the name is the same reason, if there is no return None.
# 查找目标
try:
link_0 = soup.select('.btn-success')[0].get('href')[0]
name_0 = soup.select('.btn-success')[0].get('download')
except:
link_0 = None
name_0 = None
try:
link_1 = soup.select('.btn-success')[1].get('href')[0]
name_1 = soup.select('.btn-success')[1].get('download')
except:
link_1 = None
name_1 = None
try:
link_2 = soup.select('.btn-success')[2].get('href')[0]
name_2 = soup.select('.btn-success')[2].get('download')
except:
link_2 = None
name_2 = None
try:
link_3 = soup.select('.btn-success')[3].get('href')[0]
name_3 = soup.select('.btn-success')[3].get('download')
except:
link_3 = None
name_3 = None
try:
link_4 = soup.select('.btn-success')[4].get('href')[0]
name_4 = soup.select('.btn-success')[4].get('download')
except:
link_4 = None
name_4 = None
Finally, keep to the dictionary, return parameters:
link_data = {
"0_0":link_0,
"0_1":name_0,
"1_0":link_1,
"1_1":name_1,
"2_0":link_2,
"2_1":name_2,
"3_0":link_3,
"3_1":name_3,
"4_0":link_4,
"4_1":name_4
}
return link_data
With the Download link and name, you should be able to download it, as long as urllib.request.urlretrieve () on it.
Conclusion
Learn today's knowledge, you should have a lot of harvest it! I believe that you have in the way of learning Python's one step closer!
by taoxichen
Only in boiling water, tea can expand the rich aroma of life.