Python's requests library crawler logs into 12306 railway network

This article uses python's third-party library requests to realize the 12306 network login and verification code submission function.

This operation is to manually enter the verification code, not the automatic recognition and submission of the verification code, because the automatic recognition and submission of the verification code requires image processing and pattern recognition, and I have not learned this knowledge.

Implementation steps:

1. Create a session, if you only use requests.get() and post() and so on, it is impossible to log in. Because after directly using the requests.get() method to execute, the cookie is not saved, the next time you visit the server, you will not be able to get the verification code picture and anything else. Students who have studied the web should understand.

2. Use session to enter the login homepage

Then open the browser, enter the 12306 network login page, and open the review element of the browser

3. Get the verification code picture and save it to a local file

4. Enter the verification code manually

Because the verification code of the 12306 network is clicked, not letters or numbers

The solution is:

It is found that the data in the red box in the above figure is the verification code data submitted to the server. This data is the coordinate composition of the pixels of the 8 correct verification code pictures in the verification code picture.

We build a dictionary to store the pixel coordinates of the 8 pictures, and a function to obtain the correct pixel coordinates of the verification picture

How to know the pixel coordinates of each picture?

You can use your computer's drawing software to open the saved captcha image. Generally, the y-axis coordinate should be subtracted by 30, because the 'above the picture, please enter all of the picture below. . . . 'Is not included, like the following (40,69) should be (40,39) or (40,40) at the end, just nearby.

Then the format of the manual input verification code is the number of the input, separated by commas. The above example is the second and fourth pictures, so enter: 2,4

In this way, the correct verification code format can be obtained through the defined function. Then verify the verification code by submitting the form.

4. Verify the verification code

According to the returned response, extract the result code to determine whether the verification is successful.

You can add print(captcha_check_response.text) to print the returned data, get the result code result_code of success and failure, so you can design the following code. The following login verification user name is the same as the password.

5. After the verification code is successfully verified, start to submit the user name and password, and find the url for submitting the user name and password in the browser review element

6. Generally, the implementation of other website logins is over at the above, but the login of 12306 net has not yet, we still need to obtain the token to determine the login success

Use web page login in the browser, the first step is to find uamtk in the review element

The second step is to find uamauthclient, usually under uamtk

code show as below:

At this point, the login is successful.

All implementation code:

import requests
#1.创建会话session
session = requests.Session()
headers = {'User-':'Mozilla/5.0 (Windows NT 6.3; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/64.0.3282.186 Safari/537.36'}
session.headers.update(headers)
#2.进入12306登录页面
url = 'https://kyfw.12306.cn/otn/login/init'
response = session.get(url)
#3.获取验证码图片,并保存到本地文件
captcha_url = 'https://kyfw.12306.cn/passport/captcha/captcha-image?login_site=E&module=login&rand=sjrand&0.8006430591776557'
captcha_response = session.get(captcha_url)
with open('captcha.jpg','wb') as f:
    f.write(captcha_response.content)
#4.验证码提交验证
#定义一个字典变量来存验证码图片的像素坐标
map = {
    '1':'35,40',
    '2':'107,45',
    '3':'175,43',
    '4':'254,43',
    '5':'37,115',
    '6':'110,118',
    '7':'177,117',
    '8':'255,119'
}
#定义一个函数来获取验证码的提交格式
def make_answer(numpict):
    num = numpict.split(',')
    answer = ''
    for i in range(len(num)):
        answer += map[num[i]] + ','
    return answer[:-1]

#手动输入验证码
captcha_answer = make_answer(input('请输入正确的验证码:'))
captcha_check_url = 'https://kyfw.12306.cn/passport/captcha/captcha-check'
form_data = {
    'answer':captcha_answer,  #验证码答案,填写正确图片的像素坐标
    'login_site':'E',
    'rand':'sjrand'
}
#校验验证码
captcha_check_response = session.post(captcha_check_url,data = form_data)
result_code = captcha_check_response.json()["result_code"] #获取返回的结果代码判断是否校验成功
if result_code != '4':
    if result_code == '7':
        print("验证码已经过期")
    else:
        print('验证码校验失败')
else:
    #验证码校验成功,执行用户名密码提交登录
    login_url = 'https://kyfw.12306.cn/passport/web/login'
    username = 'xxxxxxxxxx'
    password = 'xxxxxx'
    form_data = {
        'username': username,
        'password': password,
        'appid':'otn'
    }
    login_response = session.post(login_url,data = form_data)
    if login_response.json()["result_code"] == 0:
        #登录成功,获取权限token
        uamtk_url = 'https://kyfw.12306.cn/passport/web/auth/uamtk'
        form_data = {'appid':'otn'}
        uamtk_response = session.post(uamtk_url,data = form_data)
        if uamtk_response.json()["result_code"] == 0:
            uamauthclient_url = 'https://kyfw.12306.cn/otn/uamauthclient'
            form_data = {'tk':uamtk_response.json()["newapptk"]}
            uamauthclient_response = session.post(uamauthclient_url,data = form_data)
            print(uamauthclient_response.json())
        else:
            print('权限token获取失败')
        print('登录成功')
    else:
        print('登录失败,用户名或者密码错误')

exit()

 

Guess you like

Origin blog.csdn.net/Thanours/article/details/83575581