[Practical crawler combat] Use Python and JS to reverse Douyin X-Bogus parameters to obtain N videos

Preface

I have learned some knowledge points about JS reverse engineering before, but they are all relatively basic and can basically only be used to supplement JS functions. This time, we will take Douyin as an example to try a new method of debugging breakpoints in the environment and developer tools.

1. Goal analysis

1. Filter interface

First, randomly select a user's homepage, and you can see several works. The goal is to obtain links to all the works based on the user, and then download them.

The homepage of a certain UP owner

There are many requested interfaces, and the target interface is finally screened out:

videoYou can see some URL addresses by searching for keywords

Download link

Then visit it to see if it is a real download link.

Indeed, that's what happened.

2. Check the request header

There are no more screenshots here. In fact, there are not many special fields in it, so Cookie is a bit special.

3. Load

There are three fields in this part that are ciphertext

I don’t know which one is decisive. It may be that they are all necessary, or there may be only one.

2. Logical analysis

1. Find the request entry

Definitely use the simple method first - searching for keywords. If that doesn't work, then consider other methods. However, you can search for X-Bogus on this website . Although it can be found, after hitting the breakpoint, it is found that it does not go there. No more demonstration here. The same goes for the other two ciphertext fields.

So the safest way is to find the location from the launcher.

It is found that this is an ajax request, but you can see that many requests are sent from here, so it is not easy to track the target interface. But we can use a new way to trace only this interface:

Add one here and enter the target URL, but remember to cancel all previous breakpoints first.

You can see that it automatically stops here, and the target URL is displayed.

breakpoint

In fact, when you are here, enter on the console this, you can see that the XB value has been generated, at the end:

"/aweme/v1/web/aweme/post/?device_platform=webapp&aid=6383&channel=channel_pc_web&sec_user_id=MS4wLjABAAAAqwlfqpCgGCpMAxMEQm_evPUupsTBamwkG5-s6LWqqOgBTv9tniP-7P6QrjK4-m1N&max_cursor=0&locate_query=false&show_live_replay_strategy=1&need_time_list=1&time_list_query=0&whale_cut_token=&cut_version=1&count=18&publish_video_strategy_type=2&pc_client_type=1&version_code=170400&version_name=17.4.0&cookie_enabled=true&screen_width=1920&screen_height=1080&browser_language=zh-CN&browser_platform=MacIntel&browser_name=Chrome&browser_version=116.0.0.0&browser_online=true&engine_name=Blink&engine_version=116.0.0.0&os_name=Mac+OS&os_version=10.15.7&cpu_core_num=8&device_memory=8&platform=PC&downlink=10&effective_type=4g&round_trip_time=150&webid=7304127941348312585&msToken=m481D4fH-oOr3yUt_GMWxmhIvvFhjoWXQnWK8AK0ZaKuEwbkp-goBkCM5C6N4u03IiMM2JGF064qCIXQjKgQm-SOnXG3OSDbTLhDezOjda_r4pzEL3Fl9MWytJ904EJI&X-Bogus=DFSzswVOcYsANSTItmLIGMm4pIDg"

So you can check the previous step call.

Follow up further

At this point, you can be sure that it is the ciphertext generated by this line, but you will find that _0xc26b5e, _0x1f1790these two values ​​​​are constantly changing. Let’s talk briefly about the grammar here.

_0x2458f0['y']++) : _0xcc6308[++_0x2e1055] = _0x2458f0['apply'](_0xc26b5e, _0x1f1790);

applyIt is a function method in JavaScript that is used to call a function, and can specify the context ( thisvalue) when the function is executed, and pass an array or array-like object as a parameter of the function.

In this line of code, _0x2458f0['apply']it means calling the method _0x2458f0of the object apply. The purpose of this method is to call a function and pass elements of an array or array-like object as parameters to the function.

Specifically:

  • _0x2458f0: This is an object, possibly a function object.
  • .apply: This is a method of the JavaScript function object used to call the function.
  • _0xc26b5e: This is a function that will be applypassed as the first argument to the method _0x2458f0.
  • _0x1f1790: This is an array that will be applypassed to the function in as the second parameter of the method _0x2458f0.

The purpose of this line is to call _0x2458f0a function in the object (perhaps an element in the function array) _0xc26b5eas the context, pass _0x1f1790the array as a parameter, and then assign the result to _0xcc6308[++_0x2e1055].

But it’s still difficult to locate. We have the following two ways to continue tracking, log breakpoints and conditional breakpoints. First, _0x2458f0['apply'](_0xc26b5e, _0x1f1790)the breakpoint: log breakpoint: console.log(_0x2458f0['apply'](_0xc26b5e, _0x1f1790)). There is a lot of output, but you can filter keywords, such as the first few letters of URL or XB:

You can see that the XB value does appear, and the URL also appears. Next, you can use conditional breakpoints to stop when the XB value is output, and then trace the function _0x2458f0['apply'](_0xc26b5e, _0x1f1790).length==28. Because the length of the XB value is 28 bits, it can be judged based on this.

2. Find the encryption function

Refresh, then it is interrupted, check the value

Then trace the function and you can see that the XB value appears:

return value

Positioning is over.

3. Code implementation

3.1 JS part

There are two implementation methods, the first is to complement the function, and the second is to complement the environment.

3.1.1 Complement function

Let’s look at the first option first, copy the entire file and then run it directly.

ReferenceError: window is not defined

Increasewindow = global

ReferenceError: Request is not defined

Locate the code:

var _0x2aa7e4 = Request && Request instanceof Object
, _0x2b58b8 = Headers && Headers instanceof Object;

You can see that there are two Boolean values. After running on the console, it is found that both are true. Modify the code:

var _0x2aa7e4 =true
, _0x2b58b8 = true;

ReferenceError: document is not defined

In fact, what is suggested here is document['referrer']that you can just copy one in the console or request header.

document = {
   
    
    
    "referer":'https://www.douyin.com/user/MS4wLjABAAAAjemOgh7N4uocHHEMmnTrewBlqxuGnVMPr4kVZv6h12s',
}

TypeError: document.addEventListener is not a function

Just like referer, just add:

document = {
   
    
    
    "referer":'https://www.douyin.com/user/MS4wLjABAAAAjemOgh7N4uocHHEMmnTrewBlqxuGnVMPr4kVZv6h12s',
    'addEventListener': function addEventListener(){
   
    
    }
}

At this time, you will find that the error is no longer reported, then you can use a global variable to obtain the XB value and find the previously encrypted function _0x5a8f25.

function _0x5a8f25(_0x48914f, _0xa771aa) {
   
    
    
        return ('undefined' == typeof window ? global : window)['_$webrt_1668687510']('484e4f4a403f52430017211b45bdadd5a9f8450800000000000007fa1b0002012f1d00921b000b191b000b02402217000a1c1b000b1926402217000c1c1b000b190200004017002646000306000e271f001b000200021d00920500121b001b000b031b000b19041d0092071b000b0401220117000b1c1b000b051e01301700131b00201d00741b000b06260a0000101c1b000b07260a0000101c1b001b000b081e01311d00931b001b000b091e00091d00941b0048021d00951b001b000b1d1d009d1b0048401d009e1b001b000b031b000b18041d00d51b001b000b0a221e0132241b000b031b000b0a221e0132241b000b200a000110040a0001101d00d71b001b000b0a221e0132241b000b031b000b0a221e0132241b000b1a0a000110040a0001101d00d91b000b0b1e00161e01330117002d1b000b0b1e001602000025001d11221e006e24131e00530201340200701a020200000a000210001d01331b001b000b0c1e00101d00da1b000b232217000e1c211b000b23430201353e1700151b001b000b23221e0133240a0000101d00da1b001b000b0d261b000b1c1b000b1b0a0002101d00db1b001b000b0e261b000b241b000b230a0002101d00dd1b001b000b0f261b000b250200200a0002101d00e11b001b000b0a221e0132241b000b031b000b26040a0001101d00e21b001b000b101a00221e00dc240a0000104903e82b1d00e31b001b000b11260a0000101d00e41b001b000b1f1d00e71b001b000b1c4901002b1d00e81b001b000b1c4901002c1d00ea1b001b000b1b1d00ee1b001b000b21480e191d00f31b001b000b21480f191d00f91b001b000b22480e191d00fa1b001b000b22480f191d00fc1b001b000b27480e191d00ff1b001b000b27480f191d01011b001b000b284818344900ff2f1d01021b001b000b284810344900ff2f1d01041b001b000b284808344900ff2f1d01361b001b000b2848003

Guess you like

Origin blog.csdn.net/u013046615/article/details/134583508