python analog Baidu landing

This article was originally Address

table of Contents

Description
Environmental ready
landing process analysis
login process complete code
validation testing

Explanation

This article do is Baidu landed two-dimensional code scan code , scan code to do as to why the landing, mainly because: 1, landing with the account password, during the test, if cleared cookie, will pop up a verification code, but added this does not matter, death the departure is likely to account protection mechanisms Baidu in the login process, even if the input validation code, Baidu will be mandatory for secondary verification SMS, this trigger mechanism is not yet clear.

Preparing the environment

Prepare python3the environment and install requestslibraries, chorme browser. About requeststhe basic usage of the library can reference links: reptiles weapon requests


Landing process analysis

Need to know is: Baidu is cookieto verify the identity of the user, and when after the successful landing, you can avoid verify access to all relevant sites Baidu, including Baidu Post Bar, Baidu cloud disk and so on. Analog mark landing success is that we get used to verify identity cookie. So, our main purpose is what set the cookie request to find
the first step to open: chormeall browsers, clearing Baidu related cookie, as had already landed, then delete cookieafter landing in a non-state. As shown below, FIG delete all cookie:

Step two : once again visit Baidu home page, and clear the existing cookieguarantee click 登陆until the button is no cookiegenerated. After press F12key to enter the browser developer mode, click on the Login button, select the two-dimensional code login, view the request record. As shown below:


Can be found: remove css/imgthe resources, the main links are:

序号 链接
1 uni_login_wrapper.js...
2 _blank.html
3 ?getapi&tpl=mn...
4 getqrcode?lp=pc...
5 viewlog?ak=1e3f2...
6 ?loginhistory&token=a7...
7 unicast?channel_id=d15...

查看uni_login_wrapper...这个链接的具体信息,可以看到,它的响应头中设置了BAIDUID这个cookie,并且可以发现,后面的所有链接的请求中都带有BAIDUID这个cookie


查看_balnk.html可以发现,它的响应头中没有设置任何cookie


查看?getapi这个链接,我们可以发现,它的响应头中设置了HOSUPPORT=1这个cookie,并且返回了一个json数组,里面有一个token,它的请求参数与响应分别如下图:


这个请求的主要作用就是为了获得token,它是后续?loginhistory?token=...请求的参数。


查看getqrcode?lp=pc..这个链接,可以发现,它的响应头中没有设置cookie,但是,它的返回信息中附带了一个链接,这个链接正是二维码图片的请求地址,如下图所示。访问其中的imgurl即可得到二维码。其中还有一个值sign,这个值是后续unicast?channel_id=34...这个请求中的channel_id


查看viewlog?ak=...这个请求可以看到,它的响应头中设置了pplogid这个cookie,测试发现,这个cookie并不是关键性的,可有可无,如下图:


查看?loginhistory?token=...这个请求可以发现,它设置了PASSID、UBI、HISTORY这三个cookie


查看unicast?channel_id=...这个请求,可以发现,只要我们一直没有扫描二维码,客户端会不断发送这个请求,这个请求正是用来检测我们是否扫描了二维码。,如下图所示


第三步:用百度APP扫描二维码,继续分析后续请求,扫描之后,会出现三个关键请求,如下图所示:

从上一步已经知道unicast?channel_id=...这个请求是为了判断我们是否已经扫描了二维码,可以判定,如果扫描成功,它必定会返回一个用于认证的重要信息。两次请求的反回值分别如下:


从这几张图可以看出,这个返回的v与登陆请求qrbdusslogin?v=..&bduss=...中的参数bduss值相同,与之对应。
继续分析最后的登陆请求,即qrbdusslogin?v=..budss=...,观察它的响应头可以发现其中有几个cookie值:STOKENPTOKENBDUSS,这正是授权登陆的关键所在,如下图所示:

至此,获得上述几个值后,登陆成功。


登陆过程代码

首先定义两个工具函数,一个用来获取毫秒级的时间戳,一个用来将类似于下图的返回信息转换为json,方便信息提取
[外链图片转存失败(img-IiKBvGHH-1562065335192)(/public/static/extimg/article/2019/1561993367666qrurl.png)]

# 获取毫秒级时间戳
def get_cur_timestamp():
    return int(round((time.time()) * 1000))
# 将callback转换为json
def parsecallback_tojson(callbackstr):
    return json.loads(re.search(r'\(.*\)', callbackstr).group().replace("(", "").replace(")", ""))

网站一般还对请求头有所限制,所以还需定义请求头,将其保存在config.py文件中。

# -*- coding: utf-8 -*-
headers = {
    'passport_headers':{
        'Host': 'passport.baidu.com',
        'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64)',
        'Refer': 'https://www.baidu.com/?tn=62095104_7_oem_dg'

    },
    'tieba_headers':{
        'Host': 'tieba.baidu.com',
        'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64)',
        'Refer': 'https://www.baidu.com/?tn=62095104_7_oem_dg',
        'Origin': 'https://tieba.baidu.com'
    }
}

** 获取BAIDUID: ** 根据上面的分析,首先获取BAIDUID,并定义一个login_cookies封装请求cookie。请求所需要的参数都可以通过浏览器的开发者摸式看到,这里不再细说。

 # 封装登陆用的cookie
 login_cookies = {}
 headers = config.headers['passport_headers']
 init_url='https://passport.baidu.com/passApi/js/uni_login_wrapper.jscdnversion=1561973784431&_=1561973762531'
 init_r = requests.get(init_url, headers=headers, verify=False)
 # 获取BAIDUID
 init_cookies = init_r.cookies
 login_cookies['BAIDUID'] = init_cookies['BAIDUID']
 

** 获取token: ** 利用正则表达式直接从返回结果中提取,其中tpl、apiver、class、loginversion、logintype这几个参数都是固定的,tt是当前的时间戳,callback是接收返回信息的参数,格式是固定的,可以写死。

# 获取token
token_url = 'https://passport.baidu.com/v2/api/?getapi'
t_params = {
    'tpl': 'mn',
    'apiver': 'v3',
    'tt': get_cur_timestamp(),
    'class': 'login',
    'gid': gid,
    'loginversion': 'v4',
    'logintype': 'dialogLogin',
    'traceid': None,
    'callback': 'bd__cbs__5kjmhe'
}
token_r = requests.get(token_url, headers=headers, params=t_params, cookies=init_cookies, verify=False)
token = re.search(r'[\w]{32}', token_r.text).group()

** 请求验证码,并保存到本地,解析signcode ** :其中gid是随机生成的格式固定的字符串,tt、_都是时间戳,其它参数是固定的,在这个步骤中需要解析出signcode这个参数,它是与当前生成的二维码唯一对应的字符串,用于拼接二维码图片地址和后面的扫码校验。

# 获取验证码地址
qrcode_url = 'https://passport.baidu.com/v2/api/getqrcode'
qr_params = {
    'lp': 'pc',
    'qrloginfrom': 'pc',
    'gid': '6F11F8D-EDD5-4A78-8B51-42D86D2DA7F4',
    'callback': 'tangram_guid_1561697778375',
    'apiver': 'v3',
    'tt': get_cur_timestamp(),
    'tpl': 'mn',
    '_': get_cur_timestamp()
}
qrcode_r = requests.get(qrcode_url, headers=headers, params=qr_params, cookies=init_cookies, verify=False)
# 从返回信息中解析出signcode
signcode = re.search(r'[\w]{32}', qrcode_r.text).group()
qrimg_url = 'https://passport.baidu.com/v2/api/qrcode?sign=%s&lp=pc&qrloginfrom=pc' % signcode
# 将验证码保存进入图片
with open('qrcode.jpg', 'wb') as f:
    qr_r = requests.get(qrimg_url, headers=headers, cookies=login_cookies, verify=False)
    f.write(qr_r.content)

** history请求 ** 获取passid、ubi、history三个cookie,并封装到auth_cookies

 # 获取passid,ubi,history参数
 login_cookies['HOSUPPORT'] = '1'
 loginhistory_url = 'https://passport.baidu.com/v2/api/?loginhistory'
 loginhistory_params = {
    'token': token,
    'tpl': 'mn',
    'apiver': 'v3',
    'tt': get_cur_timestamp(),
    'loginversion': 'v4',
    'gid': gid,
    'traceid': None,
    'callback': 'bd__cbs__um4fp5'
}
loginhistory_r = requests.get(loginhistory_url, params=loginhistory_params, headers=headers,
                                      cookies=login_cookies, verify=False)
loginhistory_cookiestr = loginhistory_r.headers['Set-Cookie']
passid = None
ubi = None
history = None
# 获取PASSID
paasid_searchres = re.search(r'PASSID=(\w*[^;])', loginhistory_cookiestr)
if paasid_searchres:
    passid = paasid_searchres.group(1)
# 获取UBI
ubi_searchres = re.search(r'UBI=([\w%-]*[^;])', loginhistory_cookiestr)
    if ubi_searchres:
ubi = ubi_searchres.group(1)
# 获取HISTORY
history_searchres = re.search(r'HISTORY=(\w-*[^;])', loginhistory_cookiestr)
    if history_searchres:
history = history_searchres.group(1)
login_cookies['PASSID'] = passid
login_cookies['UBI'] = ubi
login_cookies['HISTORY'] = history

** 扫码,获取BDUSSD参数: ** 上面的步骤中已经将二维码保存到本地,此时构建unicast轮询是否已经扫描二维码,如果扫描了,就获取返回的bdussd参数。

 # 获取扫码登陆信息,判断是否已经扫码
 channel_url = 'https://passport.baidu.com/channel/unicast'
 chanel_param = {
    'channel_id': signcode,
    'tpl': 'mn',
    'gid': gid,
    'callback': 'tangram_guid_1561776159383',
    'apiver': 'v3',
    'tt': get_cur_timestamp(),
    '_': get_cur_timestamp()
 }
 while True:
    channel_r = requests.get(channel_url, headers=headers, cookies=login_cookies, verify=False, params=chanel_param)
    channel_r_json = parsecallback_tojson(channel_r.text)
    # 获取bdussd,此处巨坑,判定扫码之后需再次请求一次否则获取不到bdussd
    if (channel_r_json['errno'] == 0) and (json.loads(channel_r_json['channel_v'])['status']) == 1:
        channel_r = requests.get(channel_url, headers=headers, cookies=login_cookies, verify=False,params=chanel_param)
        bdussd = json.loads(parsecallback_tojson(channel_r.text)['channel_v'])['v']
        break
        

** 获取登陆授权cookie **:利用上步得到的bduss进行登陆请求,获取授权信息,并将其写入本地auth_cookies.txt文件中

# 利用bdussd进行登陆
login_url = 'https://passport.baidu.com/v3/login/main/qrbdusslogin'
login_params = {
    'v': get_cur_timestamp(),
    'bduss': bdussd,
    'u': 'https://www.baidu.com/?tn=62095104_7_oem_dg',
    'qrcode': '1',
    'pl': 'mn',
    'apiver': 'v3',
    'tt': get_cur_timestamp(),
    'traceid': None,
    'callback': 'bd__cbs__raxv3h'
}
login_r = requests.get(login_url, headers=headers, params=login_params, cookies=login_cookies, verify=False)

# 解析授权cookie
auth_cookies = {}
loginsuccess_cookiestr = login_r.headers['Set-Cookie']
BDUSS = re.search(r'BDUSS=([\w%-]*[^;])', loginsuccess_cookiestr).group(1)
STOKEN = re.search(r'STOKEN=([\w%-]*[^;])', loginsuccess_cookiestr).group(1)
PTOKEN = re.search(r'PTOKEN=([\w%-]*[^;])', loginsuccess_cookiestr).group(1)
auth_cookies['BDUSS'] = BDUSS
auth_cookies['STOKEN'] = STOKEN
auth_cookies['PTOKEN'] = PTOKEN
with open('auth_cookies.txt', 'w') as f:
    f.write(json.dumps(auth_cookies))

至此,整个登陆流程完成。


完整代码以及登陆有效性验证

# -*- coding: utf-8 -*-
import requests
import time
import config
import re
import json

# 获取毫秒级时间戳
def get_cur_timestamp():
    return int(round((time.time()) * 1000))
# 将callback转换为json
def parsecallback_tojson(callbackstr):
    return json.loads(re.search(r'\(.*\)', callbackstr).group().replace("(", "").replace(")", ""))
    
class QRLogin:
    @staticmethod
    def get_auth_cookie():
        # 封装登陆用的cookie
        login_cookies = {}
        # 随机生成的字符串、格式固定
        gid = '6F11F8D-EDD5-4A78-8B51-42D86D2DA7F4'
        headers = config.headers['passport_headers']
        init_url='https://passport.baidu.com/passApi/js/uni_login_wrapper.js?cdnversion=1561973784431&_=1561973762531'
        init_r = requests.get(init_url, headers=headers, verify=False)

        # 获取BAIDUID
        init_cookies = init_r.cookies
        login_cookies['BAIDUID'] = init_cookies['BAIDUID']

        # 获取token
        token_url = 'https://passport.baidu.com/v2/api/?getapi'
        t_params = {
            'tpl': 'mn',
            'apiver': 'v3',
            'tt': get_cur_timestamp(),
            'class': 'login',
            'gid': gid,
            'loginversion': 'v4',
            'logintype': 'dialogLogin',
            'traceid': None,
            'callback': 'bd__cbs__5kjmhe'
        }
        token_r = requests.get(token_url, headers=headers, params=t_params, cookies=login_cookies, verify=False)
        token = re.search(r'[\w]{32}', token_r.text).group()

        # 获取验证码地址
        qrcode_url = 'https://passport.baidu.com/v2/api/getqrcode'
        qr_params = {
            'lp': 'pc',
            'qrloginfrom': 'pc',
            'gid': gid,
            'callback': 'tangram_guid_1561697778375',
            'apiver': 'v3',
            'tt': get_cur_timestamp(),
            'tpl': 'mn',
            '_': get_cur_timestamp()
        }

        qrcode_r = requests.get(qrcode_url, headers=headers, params=qr_params, cookies=login_cookies, verify=False)
        signcode = re.search(r'[\w]{32}', qrcode_r.text).group()
        qrimg_url = 'https://passport.baidu.com/v2/api/qrcode?sign=%s&lp=pc&qrloginfrom=pc' % signcode
        # 将验证码保存进入图片
        with open('qrcode.jpg', 'wb') as f:
            qr_r = requests.get(qrimg_url, headers=headers, cookies=init_cookies, verify=False)
            f.write(qr_r.content)

        # 获取passid,ubi,history参数
        login_cookies['HOSUPPORT'] = '1'
        loginhistory_url = 'https://passport.baidu.com/v2/api/?loginhistory'
        loginhistory_params = {
            'token': token,
            'tpl': 'mn',
            'apiver': 'v3',
            'tt': get_cur_timestamp(),
            'loginversion': 'v4',
            'gid': gid,
            'traceid': None,
            'callback': 'bd__cbs__um4fp5'
        }
        loginhistory_r = requests.get(loginhistory_url, params=loginhistory_params, headers=headers, cookies=login_cookies, verify=False)
        loginhistory_cookiestr = loginhistory_r.headers['Set-Cookie']
        passid = None
        ubi = None
        history = None
        # 获取PASSID
        paasid_searchres = re.search(r'PASSID=(\w*[^;])', loginhistory_cookiestr)
        if paasid_searchres:
            passid = paasid_searchres.group(1)
        # 获取UBI
        ubi_searchres = re.search(r'UBI=([\w%-]*[^;])', loginhistory_cookiestr)
        if ubi_searchres:
            ubi = ubi_searchres.group(1)
        # 获取HISTORY
        history_searchres = re.search(r'HISTORY=(\w-*[^;])', loginhistory_cookiestr)
        if history_searchres:
            history = history_searchres.group(1)
        login_cookies['PASSID'] = passid
        login_cookies['UBI'] = ubi
        login_cookies['HISTORY'] = history

        # 获取扫码登陆信息,判断是否已经扫码
        channel_url = 'https://passport.baidu.com/channel/unicast'
        chanel_param = {
            'channel_id': signcode,
            'tpl': 'mn',
            'gid': gid,
            'callback': 'tangram_guid_1561776159383',
            'apiver': 'v3',
            'tt': get_cur_timestamp(),
            '_': get_cur_timestamp()
        }

        while True:
            channel_r = requests.get(channel_url, headers=headers, cookies=login_cookies, verify=False, params=chanel_param)
            channel_r_json = parsecallback_tojson(channel_r.text)
            # 获取bdussd,此处巨坑,判定扫码之后需再次请求一次否则获取不到bdussd
            if (channel_r_json['errno'] == 0) and (json.loads(channel_r_json['channel_v'])['status']) == 1:
                channel_r = requests.get(channel_url, headers=headers, cookies=login_cookies, verify=False,params=chanel_param)
                bdussd = json.loads(parsecallback_tojson(channel_r.text)['channel_v'])['v']
                break

        # 利用bdussd进行登陆
        login_url = 'https://passport.baidu.com/v3/login/main/qrbdusslogin'
        login_params = {
            'v': get_cur_timestamp(),
            'bduss': bdussd,
            'u': 'https://www.baidu.com/?tn=62095104_7_oem_dg',
            'qrcode': '1',
            'pl': 'mn',
            'apiver': 'v3',
            'tt': get_cur_timestamp(),
            'traceid': None,
            'callback': 'bd__cbs__raxv3h'
        }
        login_r = requests.get(login_url, headers=headers, params=login_params, cookies=login_cookies, verify=False)

        # 解析授权cookie
        auth_cookies = {}
        loginsuccess_cookiestr = login_r.headers['Set-Cookie']
        BDUSS = re.search(r'BDUSS=([\w%-]*[^;])', loginsuccess_cookiestr).group(1)
        STOKEN = re.search(r'STOKEN=([\w%-]*[^;])', loginsuccess_cookiestr).group(1)
        PTOKEN = re.search(r'PTOKEN=([\w%-]*[^;])', loginsuccess_cookiestr).group(1)
        auth_cookies['BDUSS'] = BDUSS
        auth_cookies['STOKEN'] = STOKEN
        auth_cookies['PTOKEN'] = PTOKEN
        with open('auth_cookies.txt', 'w') as f:
            f.write(json.dumps(auth_cookies))
        return auth_cookies

if __name__ == '__main__':
    print(QRLogin.get_auth_cookie())

有效性测试

利用之前获取到的授权cookie访问百度个人中心页面,如果能取到用户名,则说明登陆成功。获取用户名可以通过检查元素,找到class=ibx-uc-nicka标签,用正则表达式取出,具体如下图

具体代码如下:

def check():
    headers = {
        'Host': 'i.baidu.com',
        'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64)',
        'Refer': 'https://www.baidu.com/?tn=62095104_7_oem_dg',
        'Origin': 'https://baidu.com'
    }
    # 这个函数会暂时阻塞,直到扫码为止
    auth_cookies = QRLogin.get_auth_cookie()
    ibaidu_url = 'https://i.baidu.com'
    ibaidu_r = requests.get(ibaidu_url, cookies=auth_cookies, headers=headers)
    pattern = r'<a[^>]*class=\"ibx-uc-nick\">([^<]*)</a>'
    myname = re.search(pattern, ibaidu_r.text).group(1)
    print("你的登陆用户名是:%s" % myname)

Guess you like

Origin www.cnblogs.com/chenqm/p/11122928.html