Hands-on operation of JS reverse crawler (2)

This case was completed independently without reference to any materials. Although it is not a difficult JS reverse, it is still a bit difficult for novices. Without further ado, let's get down to business. The goal of this cracking is to download songs from music websites.

Target website:
In case of infringement, it is omitted here. I need my private self.

Basic idea:
search the name of the song, get the address of the song, and complete the download.

Reverse process:
1. Search for songs, through manual observation and search, it is not difficult to find the target request information under the JS panel under Network.

2. Let's take a look at the specific information of this request:

Headers:

Playload: It can be seen that there are many parameters, and a screenshot cannot fit.

 

 3. Find the encryption parameters. By searching different song names and comparing the request information, it can be seen that the parameters that change in each request are keyword and signature, as well as clienttime, mid, and uuid, a total of 5 parameters. Among them, the last three are obviously timestamps, and keyword is the name of the song to be searched. So the most critical parameter is signature. The next thing to do is to find the generation principle of this parameter.

4. In the developer state, click search and enter signature.

 , there is only one result, double-click to open it, and then click Beautify Print.

You can see the following interface:

5. Look at the code of this line:
h.signature = faultylabs.MD5(o.join("")),
the original signature parameter is generated by a function (faultylabs.MD5) with a parameter (o.join("")) of.
As you can see from the function name, this function encrypts the string with md5 and returns the uppercase letters.

If you can't see it, you can also dig the code here.
Put a breakpoint, refresh, move the mouse to MD5, the system prompts that this function is anonymous(a).

 Click this function to display the following interface:

 The function is very long and ends with something like this,

 Cut out the entire function to make a new js file, and then test the function of this function with any string parameter, and the result is a long string of
numbers and uppercase letters.

The result is the same as directly encrypting hashlib.md5(string.encode(encoding='utf-8')).hexdigest() with python's md5.

6. Now there are only parameters left, which is the generation principle of o.join(""). Obviously, it is to splice the strings of the list o.
So where does the list o come from? We check the small box in front of the join, and then execute:


When we get here, we look at the result of o in the Console panel, enter o, and get:

 Is this value of o familiar? That's right, the key-value pairs of the request parameter dictionary are spliced ​​with "=" as elements of the list. Then add the fixed string 'NVPh5oo715z5DIWAeQlhMDsWXXQV4hwt' before and after the list
. Note that here are some parameters, and in a certain order! Here is a pit.

At this point, all the encryption principles have been done.
So, are you done loading? not yet! Don't be too happy! We just handle the request and get the following results:

7. So how to get the music download link based on these results?
Go back to the original search results, click on any result, the address of the pop-up window is as follows, in the form of https://www.kugou.com/song/#hash={filehash}&album_id={album_id}

 As long as you get the two parameters of filehash and albumid, you can construct the address of the music request, and these two parameters have already been obtained.

 8. The next step is to obtain the song link address. Enter the music player window, right-click, and click Check to view the source code of the web page.
Search for mp3, the results are as follows:

However, according to the construction request of the above-mentioned song playback address, it is found that there is no mp3 file in the result. The judgment is that dynamic rendering technology is used, so I think of selenium requesting to obtain the song address, and then you can request to download it through requests.

At this point, the entire music downloading process is completed. The complete code above is as follows:    

'''
construct rquests address, then get '.mp3' address in the response. then download by requests it.
'''
import requests,pprint,re,json,time,hashlib
from selenium import webdriver
# from selenium.webdriver.chrome.options import Options
headers = {
    'authority': 'complexsearch.kugou.com',
    'sec-ch-ua': '" Not;A Brand";v="99", "Google Chrome";v="97", "Chromium";v="97"',
    'sec-ch-ua-mobile': '?0',
    'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/97.0.4692.99 Safari/537.36',
    'sec-ch-ua-platform': '"Windows"',
    'accept': '*/*',
    'sec-fetch-site': 'same-site',
    'sec-fetch-mode': 'no-cors',
    'sec-fetch-dest': 'script',
    'referer': 'https://www.kugou.com/',
    'accept-language': 'en-US,en;q=0.9,zh-CN;q=0.8,zh;q=0.7',
    'cookie': 'kg_mid=1a8bf97ec26db1b38258848ec815e0a4',
}

def get_sign(params):
    '''
    return 'signature' parameter to be used in requests.
    Param: a dict of requests paramter
    '''
    string=''
    for i in params:        
        temp=i+'='+params[i]
        string+=temp
    # print(string)
    string='NVPh5oo715z5DIWAeQlhMDsWXXQV4hwt'+string+'NVPh5oo715z5DIWAeQlhMDsWXXQV4hwt' #add a string before and after the string
    return hashlib.md5(string.encode(encoding='utf-8')).hexdigest().upper()  #md5 encrpted,then capitilized.

def search_music(song_name):
    '''
    return a list of all searching results.
    '''
    search_results=[]
    params = {'bitrate':'0',
    'callback': 'callback123',
    'clienttime':str(int(time.time()*1000)),
    'clientver':'2000',
    'dfid':'-',  
    'inputtype':'0',
    'iscorrection':'1',
    'isfuzzy':'0',
    'keyword': song_name,
    'mid':str(int(time.time()*1000)),
    'page': '1',
    'pagesize': '30',
    'platform':'WebFilter',
    'privilege_filter':'0',
    'srcappid':'2919',   
    'tag':'em',
     'token':'',
    'userid':'0',
    'uuid':str(int(time.time()*1000)),
}
    params['signature']=get_sign(params)
    # print(params['signature'])
    # print(params)  
    response = requests.get('https://complexsearch.kugou.com/v2/search/song', headers=headers, params=params)
    # print(response.status_code)
    
    text=response.text[12:-2]
    search_data=json.loads(text)['data']['lists']
    # pprint.pprint(search_results)
    # print(len(search_results))
    for i in search_data:
        # print(i)
        music_list={}
        music_list['FileName']=i['FileName'].replace('<em>','').replace('</em>','')
        music_list['FileHash']=i['FileHash']
        music_list['AlbumID']=i['AlbumID']
        print(music_list['FileName'])
        # print(i['FileName'].replace('<em>','').replace('</em>',''),i['FileHash'],i['AlbumID'])
        search_results.append(music_list)
    # print(search_results)
    return search_results

def get_song_address(search_results):
    
    filehash=search_results[0]['FileHash']
    album_id=search_results[0]['AlbumID']
    url=f'https://www.kugou.com/song/#hash={filehash}&album_id={album_id}'
    # print(url)
    options=webdriver.ChromeOptions()
    options.add_argument('user-agent="Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36"')
    options.add_argument('--headless')
    options.add_experimental_option('excludeSwitches', ['enable-logging']) #remove all annoying reminder like DevTools listening on ws://127.0.0.1:51589/devtools/browse
    driver=webdriver.Chrome(options=options)
    # driver=webdriver.PhantomJS() #no working

    driver.get(url)
    time.sleep(1)
    # html=driver.page_source
    # print(type(html))
    music_address=driver.find_element_by_id('myAudio').get_attribute('src')
    driver.close()
    print(music_address)
    return music_address
    

def get_song(music_address):
    with open('D:\\kugoumusic\\'+song_name+'.mp3','wb') as file: #downloading songs
        res=requests.get(music_address,headers=headers)
        file.write(res.content)   

if __name__=='__main__':
    song_name=input('请输入要下载的歌名:')
    print(f'开始搜索:  {song_name}  请耐心等待!')
    search_results=search_music(song_name)
    print(f'正在抓取歌曲地址:')
    music_address=get_song_address(search_results)
    get_song(music_address)
    print(song_name,"下载完成!")

   

Summarize:

1. Be patient enough to read the JS code and find out the generation principle involved.

2. The actual cracking process is not the above process.
In fact, it is like this:
a. First search the song name, click one of the search results, open the window to play the song, and find the song download address.
b. After finding it, I found that the address was completely unconnected with the requested address, at least I didn’t see it. And the result of the request is that there is no song address. So I thought of selenium request.
c. The question is how to construct the request address? As can be seen from the window address of the following song playback, only filehash and albumid are needed. https://www.kugou.com/song/#hash=CADCA036BF15D4D06EFAD005DD2F40F2&album_id=970586
These two parameters can be found from the song search results.
d. So the question is transformed into how to complete the search? That is, how to obtain the search results by generating and adding parameters, and then constructing the request. This is where the first half of this article is devoted.

Guess you like

Origin blog.csdn.net/weixin_45387160/article/details/122762865