Python Reptile | crawling Himalayan audio

 

"GOOD

Python Reptile | crawling Himalayan audio

    Himalaya is a well-known professional audio sharing platform, subscribers exceeded 480 million, a collection of audio fiction, audio books, children's bedtime stories, comic sketches and so on hundreds of millions of pieces of audio has become the fastest growing and largest online mobile audio Share platform. Share tonight to break down barriers, soothing Himalayan Quest, real-time capture, and saves it to!

Knowledge points:

Development Environment: windows pycharm requests json

  1. Network anti-climbing technology

  2. File operations

  3. Network requests

  4. Data conversion

  5. Data type of use

 

 

1. First of all import requests library

import requests

1.png

 

6. json converting data into a dictionary format obtained above (need to import json module)

import json

5.png

 

4.    header = {"User-Agent":"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/75.0.3770.100 Safari/537.36"}

This is a response to anti-reptile mechanism, disguised as a legitimate browser added, had copied a User-Agent: Mozilla / 5.0 (Windows NT 10.0; Win64; x64) AppleWebKit / 537.36 (KHTML, like Gecko) Chrome / 75.0.3770.100 Safari / 537.36 python by User-Agent is not recognized, it is the User-Agent quotes, while the contents of the colon can also be enclosed in quotation marks, so there is a legal message; the location information: press F12- > Network-> headers-> RequestHeaders-> User-Agent: Mozilla / 5.0 ... see graph

 

3.png

 

2. Set the url, link Access:

打开喜马拉雅官网->点击“轻音乐”->点击“夜色钢琴曲”->选择一首歌后会出现播放按钮(先不要点此按钮)->按F12->点击Network->点击播放按钮->此时调试窗口会弹出播放请求->点击name下的第一栏album?....->点击右边栏Headers->展开General->复制Request URL下的网址https://www.ximalaya.com/revision/play/album?albumId...即可

 url = "https://www.ximalaya.com/revision/play/album?albumId=291718&pageNum=1&sort=1&pageSize=30"

 

2.png

 

 

6.png

 

3    将获取的数据赋值给response,打印response

 response = requests.get(url).text

 print(response)

结果未获取到数据,因为网站做了反爬虫机制,所以要在上面添加header伪装成合法身份

 

5.     因为上面添加了header变量,所以应该把第3步替换为:

response = requests.get(url,headers = header).text

print(response)

 

4.png

 

After adding header, won the re-run data (JSON format); copy the data acquired following, open the URL http://www.bejson.com/, just paste the data into the input box, click on the "Format check" to discern what file format; the JSON type str, dict dictionary type; the difference between them: d = { 'name': 'zs', 'gender': 'man'} ===> is a dictionary type ; and s = '{' name ':' zs ',' gender ':' man '}' ===> is a string type, a JSON-formatted string

 

7. After assigning conversion audio_data (one can see the relationship between the tool is just one type of determination)

audio_data = json.loads(response)['data']['tracksAudioPlay']

 

7.png

 

8. loop through links and file name

for audio_info in audio_data:

    music_url = audio_info['src']

    music_name = music_url.split('/')[-1]

 

8.png

 

9. The data obtained in the hard disk of the music

 

9.png

 

Left click on the left column of the music folder to open it, right just a song, click on the "show in explorer" to open the audio file.

 

10.png

 

 Precautions

 

1-9 are sequential code operations, the complete sequence is 1,6,4,2,3,5,7,8,9;

This is because when there is BUG code to constantly add and delete items

 

 

 

 

 

Learn what to do is exercise, not learn that temper, so I've been walking on the road to practice.

1558612731.pngNew ways of sharing
Press and identify two-dimensional code, Follow us

 

Guess you like

Origin www.cnblogs.com/RyanLea/p/11072070.html