[2020-10-13] Data acquisition of dishonest persons on a website

Disclaimer: This article is for study and research only, and it is forbidden to be used for illegal purposes. Otherwise, you will be at your own risk. If there is any infringement, please notify and delete it, thank you!



Project scene:


Website: aHR0cDovL3p4Z2suY291cnQuZ292LmNuL3NoaXhpbi8=

Today, I will bring you the access to data from a certain dishonest person. The website is above, and I understand~


solution:


1. This website is mainly an identification of the verification code, so if you want to identify the verification code, you must first get his verification code picture.


We clicked on the verification code to refresh and got a new request. There are two parameters captchaId and random, so we need to find out how to generate these two parameters

Insert picture description here

We directly click on the familiar refresh from the request stack. After entering, we can see how these two parameters are generated at a glance, and then deduct JS.

Enter the stack

Insert picture description here

function getNum() {
    
    
    var chars = ['0', '1', '2', '3', '4', '5', '6', '7', '8', '9', 'A',
        'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J', 'K', 'L', 'M',
        'N', 'O', 'P', 'Q', 'R', 'S', 'T', 'U', 'V', 'W', 'X', 'Y',
        'Z', 'a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k',
        'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w',
        'x', 'y', 'z'];
    var nums = "";
    for (var i = 0; i < 32; i++) {
    
    
        var id = parseInt(Math.random() * 61);
        nums += chars[id];
    }
    return nums;
}

//刷新验证码
function refresh() {
    
    
    var randomNumber = Math.random();
    var uuid = getNum();

    return {
    
    randomNumber: randomNumber, uuid: uuid}
}

2. The next thing we have to do is to download the verification code and verify the verification code. Let’s first look at the request steps on the web page


As we can see from the figure below, the request to verify the verification code needs to carry the parameter captchaId and the recognized verification code pCode when obtaining the verification code request, and then we write the code to try it.

Insert picture description here

Three, take a look at the verification code request and the verified code

Here we are the verification code for manual recognition. Machine learning automatic recognition requires training the marking training model. I will not do this step. I am lazy. There are many machine learning recognition methods on the Internet. Post a [link](https:// blog.csdn.net/qq_26079939/article/details/109050936), you can refer to it.

Note: request verification code and verification verification code need to be performed in the same session

def get_param():
    js_str = '''function getNum() {
        var chars = ['0', '1', '2', '3', '4', '5', '6', '7', '8', '9', 'A',
            'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J', 'K', 'L', 'M',
            'N', 'O', 'P', 'Q', 'R', 'S', 'T', 'U', 'V', 'W', 'X', 'Y',
            'Z', 'a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k',
            'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w',
            'x', 'y', 'z'];
        var nums = "";
        for (var i = 0; i < 32; i++) {
            var id = parseInt(Math.random() * 61);
            nums += chars[id];
        }
        return nums;
    }
    
    //刷新验证码
    function refresh() {
        var randomNumber = Math.random();
        var uuid = getNum();
    
        return {randomNumber: randomNumber, uuid: uuid}
    }'''
    js = execjs.compile(js_str)

    return js.call('refresh')


def check_yzm(uuid,randomNumber):
    params = (
        ('captchaId', uuid),
        ('random', randomNumber),
    )
    session = requests.session()
    # 请求验证码
    response = session.get('http://zxgk.court.gov.cn/shixin/captchaNew.do', headers=headers, params=params, verify=False)
    with open('yzm.png', 'wb') as f:
        f.write(response.content)

    print('输入验证码中……')
    pCode = input()
    params = (
        ('captchaId', uuid),
        ('pCode', pCode),
    )
    # 校验验证码
    response = session.get('http://zxgk.court.gov.cn/shixin/checkyzm.do', headers=headers, params=params, verify=False)

    if response.text.strip() == '1':
        print('识别正确')
        return [1,pCode]
    else:
        print("识别错误")
        return [0,pCode]

Run and enter the verification code~

Insert picture description here

Fourth, we are going to get the data next, first look at the request link and parameters

You can see that pCode and captchaId are the verification code entered before and the parameters of the request verification code

Insert picture description here

Then we use the code to request, where the ID is used to request the details page

Insert picture description here


Fifth, the next step is the data of the details page, which also requires the two parameters of pCode and captchaId

Insert picture description here

6. Finally, we combine the entire process. The following is the complete code. Only part of the data is obtained here. If you want more data, you can modify it yourself

import requests
import execjs
import json

def get_param():
    js_str = '''function getNum() {
        var chars = ['0', '1', '2', '3', '4', '5', '6', '7', '8', '9', 'A',
            'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J', 'K', 'L', 'M',
            'N', 'O', 'P', 'Q', 'R', 'S', 'T', 'U', 'V', 'W', 'X', 'Y',
            'Z', 'a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k',
            'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w',
            'x', 'y', 'z'];
        var nums = "";
        for (var i = 0; i < 32; i++) {
            var id = parseInt(Math.random() * 61);
            nums += chars[id];
        }
        return nums;
    }
    
    //刷新验证码
    function refresh() {
        var randomNumber = Math.random();
        var uuid = getNum();
    
        return {randomNumber: randomNumber, uuid: uuid}
    }'''
    js = execjs.compile(js_str)

    return js.call('refresh')


def check_yzm(uuid,randomNumber):
    params = (
        ('captchaId', uuid),
        ('random', randomNumber),
    )
    session = requests.session()
    # 请求验证码
    response = session.get('http://zxgk.court.gov.cn/shixin/captchaNew.do', headers=headers, params=params, verify=False)
    with open('yzm.png', 'wb') as f:
        f.write(response.content)

    print('输入验证码中……')
    pCode = input()
    params = (
        ('captchaId', uuid),
        ('pCode', pCode),
    )
    # 校验验证码
    response = session.get('http://zxgk.court.gov.cn/shixin/checkyzm.do', headers=headers, params=params, verify=False)

    if response.text.strip() == '1':
        print('识别正确')
        return [1,pCode]
    else:
        print("识别错误")
        return [0,pCode]

# 获取列表数据
def get_data(pCode,captchaId):
    data = {
    
    
      'pName': '杭州',
      'pCardNum': '',
      'pProvince': '0',
      'pCode': pCode,
      'captchaId': captchaId,
      'currentPage': '1'
    }
    response = requests.post('http://zxgk.court.gov.cn/shixin/searchSX.do', headers=headers, data=data, verify=False)
    json_data = json.loads(response.text)[0]
    print('\n','总数为:',json_data['totalSize'],'总页数为:',json_data['totalPage'])
    for info in json_data['result']:
        print(info['id'],info['iname'])
        get_detail(info['id'],pCode,captchaId)
        # break

# 获取详细数据
def get_detail(id,pCode,captchaId):
    params = (
        ('id', str(id)),
        ('caseCode', '\uFF082019\uFF09\u6D590108\u62672318\u53F7'),
        ('pCode', pCode),
        ('captchaId', captchaId),
    )
    response = requests.get('http://zxgk.court.gov.cn/shixin/disDetailNew', headers=headers, params=params,verify=False)
    print(response.text)

if __name__ == '__main__':
    headers = {
    
    
        'Accept-Encoding': 'gzip, deflate',
        'Accept-Language': 'zh-CN,zh;q=0.9',
        'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/76.0.3809.87 Safari/537.36',
        'Content-Type': 'application/x-www-form-urlencoded; charset=UTF-8',
        'Accept': 'application/json, text/javascript, */*; q=0.01',
        'Referer': 'http://zxgk.court.gov.cn/shixin/',
        'X-Requested-With': 'XMLHttpRequest',
    }
    while True:
        data = get_param() # 获取请求参数
        print(data)
        uuid = str(data['uuid'])
        randomNumber = str(data['randomNumber'])
        check_flag = check_yzm(uuid,randomNumber) #验证码校验
        if check_flag[0] == 1:
            get_data(check_flag[1],uuid)
            break



Guess you like

Origin blog.csdn.net/qq_26079939/article/details/109053197