JS reverse---A practical case analysis of cookie anti-crawler cracking technology (a local zf website)


Preface

Cookie anti-crawler refers to the server-side method of distinguishing normal users from crawler programs by verifying the Cookie value in the request header. This method is widely used in Web applications.
This time, we mainly analyze the situation of websites with encrypted cookie values ​​​​
Learn how to respond to cookies and sessions
Learn Cookie value returned based on home page


Statement
All content in this article is only for learning and communication, and is not used for any other purpose. The complete code is not provided. The packet capture content, sensitive URLs, data interfaces, etc. have been processed Desensitization is strictly prohibited for commercial and illegal purposes, otherwise the author will have nothing to do with any consequences arising therefrom.
This article is prohibited from being reproduced without permission, and is prohibited from being redistributed after any modification. The author is not responsible for any accidents caused by the unauthorized use of the technology explained in this article. If there is any infringement, please contact the author immediately Delete, please consciously abide by relevant laws and regulations.


1. Cookie anti-crawler

1.1 Feature tips

Cookie encryption generally has one characteristic: multiple requests to the server are required to obtain data
Two forms/situations

  • 1. Directly access the other party's server, and the server returns you a cookie value through the response header (usually there is a keyword in the header, called set-cookie)
  • 2. When requesting the other party's server for the first time, the other party returns some JS files, obtains a cookie value through the JS algorithm in the browser, and then carries the cookie value generated by the JS to request the website during the second request. The other party displays normal data (this method is relatively frequent)

2.2 Cookie encryption principle

Insert image description here

2. Practical analysis

  • Reverse target: a certain zf website
  • Reverse parameters: X-Csrf-Token parameter/cookie value
  • Reverse interface: pubList(cookie)/published?via=pc(X-Csrf-Token)

Analyze the website
Interface: pubList
Analyze it and find that the payload part is not encrypted, and the cookie value in the request header part may have / X-Csrf-Token: The parameter part may also contain
Then I wrote a demo for testing
The szxx_session value in the cookie value and the X-Csrf in the request header -Token value will determine whether the request can be successful
X-Csrf-Token: The value position is in the response of the header document request
So what we need is Crack these two parameters

Because the X-Csrf-Token value has not changed after refreshing, copy it directly, and then search where it comes from
Make sure the value is in HTML a>
So if you want to get the value, request its interface, and then search with regular expressions to get the value

Insert image description here
Analyze the cookie value and look inside the application. You can confirm that the two cookie values ​​are returned by the backend, which is the characteristic case 1 mentioned above, so directly open the first response header returned by the server. package, see if there is a set_cookie value and determine whether it is what we need, and find that it is indeed the website location we need
Insert image description here
Then it is very simple, request and crawl the two parts That's it

code show as below

import requests
import re

headers = {
    
    
    "Accept": "application/json, text/plain, */*",
    "Accept-Language": "zh-CN,zh;q=0.9,en;q=0.8",
    "Cache-Control": "no-cache",
    "Connection": "keep-alive",
    "Content-Type": "application/x-www-form-urlencoded",
    "Origin": "http://www.zjmazhang.gov.cn",
    "Pragma": "no-cache",
    "Referer": "http://www.zjmazhang.gov.cn/",
    "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/117.0.0.0 Safari/537.36",
    "X-CSRF-TOKEN": "LeeXVPsnXRIFt1SKxeuKyfptfSvcRaw1aCkfO5D1"
}

def get_index():
    headers = {
    
    
        "Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.7",
        "Accept-Language": "zh-CN,zh;q=0.9,en;q=0.8",
        "Cache-Control": "no-cache",
        "Connection": "keep-alive",
        "Pragma": "no-cache",
        "Upgrade-Insecure-Requests": "1",
        "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/117.0.0.0 Safari/537.36"
    }
    url = "http://www.zjmazhang.gov.cn/hdjlpt/published?via=pc"
    response = requests.get(url, headers=headers)
    # print(response.cookies)
    XSRF_TOKEN = response.cookies.get('XSRF-TOKEN')
    szxx_session = response.cookies.get('szxx_session')
    X_Csrf_Token = re.findall("var _CSRF = '(.*?)';", response.text, re.S)    # 使用re.S参数以后,正则表达式会将这个字符串作为一个整体,在整体中进行匹配。(允许进行换行匹配)
    print(X_Csrf_Token)
    return XSRF_TOKEN, szxx_session, X_Csrf_Token[0] if X_Csrf_Token else ''

def get_data():
    XSRF_TOKEN, szxx_session, X_Csrf_Token = get_index()
    headers['X-CSRF-TOKEN'] = X_Csrf_Token
    url = 'http://www.zjmazhang.gov.cn/hdjlpt/letter/pubList'
    cookies = {
    
    
        "XSRF-TOKEN": XSRF_TOKEN,
        "szxx_session": szxx_session
    }

    print(headers)
    data = {
    
    
        "offset": "0",
        "limit": "20",
        "site_id": "759010",
        "time_from": "1665676800",
        "time_to": "1697212799"
    }
    response = requests.post(url, headers=headers, cookies=cookies, data=data, verify=False)

    print(response.text)
    print(response)

if __name__ == '__main__':
    get_data()

The results are as follows
Insert image description here

Write at the end:
My writing level is limited. If there are any explanations that are not in place or wrong, please give me some advice in the comment area and we can make progress together. If there is any If you need code and explanation communication, you can add me on WeChat 18847868809

Guess you like

Origin blog.csdn.net/m0_52336378/article/details/133816337