The text of text and images from the network, only to learn, exchange, not for any commercial purposes, belongs to original author, if any questions, please contact us for treatment.
PS: If necessary Python learning materials can be added to a small partner click the link below to obtain their own
http://note.youdao.com/noteshare?id=3054cce4add8a909e784ad934f956cef
Music is the life of the swap products, currently only play a lot of music can not be downloaded. Born of our technicians, how willing it?
Knowledge points:
-
requests
-
Regular Expressions
Development Environment:
-
Version: anaconda5.2.0 (python3.6.5)
-
Editor: pycharm
Third-party libraries:
-
requests
-
parcel
Web analytics
Target site: http://music.taihe.com/search?key=%E9%99%88%E7%B2%92
Analysis of the real address of the music
Choose a song by Chen particles cursory A Case Study
Open the Developer Tools, select network -> media -> Refresh the page to get to the music of the real address
But the resulting address is not read in view of the source code, Baidu music certainly be hidden. This time generally have two cases. The first is the use of JavaScript or splicing connection request encrypted second data is hidden. Since we do not know that there is a kind of situation. So we can only slowly to analyze data requests. After analysis, we can see the real address is present in the music inside the API http://musicapi.taihe.com/v1/restserver/ting?method=baidu.ting.song.playAAC&format=jsonp&callback=jQuery17206453751179783578_1544942124991&songid=243093242&from=web&_=1544942128336
并且我们请求这个 API 返回的是一个 json 数据(也就是python的字典数据类型)。只要我们使用字典的规则就能将我们的所有数据给提取到。
url拼接 获取所有数据
前面我们得到了音乐的真实地址,接下来我们就是分析真实地址的 url ,以期待得到下载所有音乐的诀窍。 仔细分析一下 url 就可以发现,?
后面的from
参数与_
即使不存在也不影响数据的请求。
并且后面的参数中的songid
其实就是歌曲的唯一id
,from
参数其实就是表明从哪个平台过来的
所以等一下我们下载音乐时,只要批量获取到歌曲的songid
就能将所有的歌曲给全部下载下来了。
批量获取singid
使用开发者工具,查看网页源码就能查看到songid
的位置,如果我们分析一个歌手页面的url
你会发现同样可以构造。
到此,整个网页分析就结束了。
实现效果
完整代码
1 import re 2 import requests 3 4 5 def get_songid(): 6 """获取音乐的songid""" 7 url = 'http://music.taihe.com/artist/2517' 8 response = requests.get(url=url) 9 html = response.text 10 sids = re.findall(r'href="/song/(\d+)"', html) 11 return sids 12 13 14 def get_music_url(songid): 15 """获取下载链接""" 16 api_url = f'http://musicapi.taihe.com/v1/restserver/ting?method=baidu.ting.song.playAAC&format=jsonp&songid={songid}&from=web' 17 response = requests.get(api_url.format(songid=songid)) 18 data = response.json() 19 print(data) 20 try: 21 music_name = data['songinfo']['title'] 22 music_url = data['bitrate']['file_link'] 23 return music_name, music_url 24 except Exception as e: 25 print(e) 26 27 28 def download_music(music_name, music_url): 29 """下载音乐""" 30 response = requests.get(music_url) 31 content = response.content 32 save_file(music_name+'.mp3', content) 33 34 35 def save_file(filename, content): 36 """保存音乐""" 37 with open(file=filename, mode="wb") as f: 38 f.write(content) 39 40 41 if __name__ == "__main__": 42 for song_id in get_songid(): 43 music_name, music_url = get_music_url(song_id) 44 download_music(music_name, music_url)