Python simulation login Taobao

I saw that there are a lot of simulated login Taobao on the Internet, but basically use scrapy, pyppeteer, selenium and other libraries to simulate login, but we have not talked about these libraries, only the requests library, then we will use requests today Library simulation login Taobao!

Before we talk about simulated login to Taobao, let’s review the previous process of using requests library to simulate login to Douban and Sina Weibo: This type of simulated login is a relatively simple login. You only need to upload the user name and password when you request to log in. , That is to say in one step!

Taobao login is more complicated, why is it complicated? Because Taobao login involves many parameters and requests more than once! Let's first talk about the Taobao login process, first understand the process principle, and then type the code, so that it is easy for everyone to understand!

1. Taobao login process

Taobao ua parameters: ua (User-Agent), hence the name user agent, Taobao's ua parameters add information such as browser, ip, computer, time, etc., and then encrypted and generated, used in many places, not just login!

The above figure is a more detailed flow chart. Considering the code level, the simulated login to Taobao is divided into the following four steps:

After entering the user name, the browser will initiate a post request to Taobao (taobao.com) to determine whether there is a slider verification!
After the user enters the password, the browser initiates another post request to Taobao (taobao.com) to verify whether the user name and password are correct, and if it is correct, it returns a token.
The browser takes the token to Alibaba (alibaba.com) to exchange st codes!
After the browser obtains the st code, take the st code to obtain the cookies. After the login is successful,
some students may ask questions: why after the Taobao (taobao.com) verification is passed, the token should be exchanged with Alibaba (alibaba.com) for the st code What? We will talk about this later!
Second, the implementation of simulated login.
Above we only talked about the approximate login process. Here, Brother Pig will first explain each step of the operation in detail, and then post the implementation code!

1. Determine whether a verification code is needed.
Currently when we log in to Taobao, the slider verification code does not appear in most cases. The login and logout of Brother Pig tried many times and only appeared once in the middle. What is it? Does the control require a slider verification code?

As can be seen from the above figure, when Brother Pig enters the user name (must be a mobile phone number), the browser will initiate a post request to verify whether the slider verification code is required. If it returns true, the slider verification code will appear. ! Otherwise it does not appear, generally it does not appear!

In the figure, we can see that two parameters were uploaded in this post request: username, ua!

Earlier, Brother Pig said that ua encrypts parameters for multiple information such as browser, ip, device information, so Brother Pig guessed whether Taobao’s verification code appeared not only from the perspective of the account, but also from the perspective of IP and equipment!

For example: a device may have a large number of accounts logged in. At this time, Taobao can obtain the device number from the ua parameter, and then restrict the device!

2. Verify username and password

This step is the fifth step in the above sequence diagram: request login, here will post more than 30 parameters such as user name, ua parameter, encrypted password, etc. to Taobao (taobao.com) for verification. Let's use code to implement it. Don't be scared by so many parameters, they are all copied from the browser!

You can see that there is a token behind the application st code link. We will analyze the specific token for later!

3. Apply for st code

We have already applied for the token of Taobao (taobao.com) above, and this step is to exchange the token for the st code.

Many people here may have questions: Why is it so troublesome to log in to Taobao? Can I log in directly at taobao.com? Why do we need to verify the user name and password in Taobao first, and then go to alibaba.com to exchange for st code login?

The framework of any company is the result of slow evolution. I think the initial Taobao login is definitely not that complicated. But with the growth of Alibaba, many business lines have been divided, but these business lines are related. For example, after a user logs in to a Taobao account, Tmall does not need to log in again? (Note that Taobao and Tmall have different top-level domain names, so cookis cannot be shared.) In order to solve this problem, single sign-on appears.

Single Sign On (Single Sign On), referred to as SSO for short, is one of the more popular solutions for enterprise business integration. The definition of SSO is that in multiple application systems, users only need to log in once to access all mutually trusted application systems. --Baidu Encyclopedia

Almost many large companies do single sign-on, so Ali's single sign-on system must be done by the parent company Alibaba (alibaba.com), and all subsidiaries call the parent company interface!

Let’s come back and analyze why Taobao login is so complicated. It’s easy to understand: the user data is here on Taobao, so Taobao (taobao.com) needs to verify the user name and password now, and the verification will generate a token, and the browser will hold the token to contact Ali. Baba (alibaba.com) applies for a single sign-on code (st code), and Alibaba will return the st code after receiving the request to verify the token, so the reason for using token to change the st code is single sign-on!

After understanding the design principle, the code implementation is very clear!


4. Login with st code

After successfully obtaining the st code, we can log in. This step is to obtain the login cookies through the st code.

At this point, we have successfully simulated login to Taobao!

5. Get Taobao nickname

In fact, above, we have successfully logged in to Taobao and returned to the user homepage link. In order to further verify the successful login, we will request the Taobao user homepage and extract the Taobao nickname by the way!

Three, summary

After the overall presentation, let's summarize it a bit, mainly from two aspects: code structure and existing problems:

1. Code structure

To release a code structure diagram for everyone to understand intuitively

This is the four steps of simulated login to Taobao that we mentioned earlier, but here we have implemented it with code!

2. There is a problem

Before writing this tutorial, you should also learn about it on the Internet, and then use your browser and packet capture tool (Charles) to practice step by step. The most important thing is that you must first understand the general flow of Taobao login, otherwise you will be confused in actual operation. Water, let’s talk about the current problems and existing problems.

First of all, the first problem is the unlocking of Taobao's slider. At present, there is no good way to crack requests. Let's crack it after introducing some crawler frameworks!
Brother Pig tried many times (more than 50 times) to log in and log out, but the slider verification code did not appear.
Some people use proxy ip on the Internet, and Brother Pig is useless here. As long as you are not super super super frequent and crawling a lot of data, generally large manufacturers are not likely to block ip, because there is a false injury rate and the impact of users is too wide, maybe one The closure is the entire community.
In the second step of verifying the username and password, nearly 30 parameters were uploaded. If you copy username, ua, and encrypted password into the verification, you can try to replace those 30 parameters with your browser!
There will be an error occasionally in the third and fourth steps, just try again!
Seeing here, I feel that Taobao simulated login is much clearer, and interested students can bookmark and forward them, and try it out on weekends. Conquer Taobao login, other logins are relatively simple!

Below is the source code

# -*- coding:utf-8 -*-
import re
import os
import json
import requests


s = requests.Session()
# cookies序列化文件
COOKIES_FILE_PATH = 'taobao_login_cookies.txt'


class UsernameLogin:

    def __init__(self, username, ua, TPL_password2):
        """
        账号登录对象
        :param username: 用户名
        :param ua: 淘宝的ua参数
        :param TPL_password2: 加密后的密码
        """
        # 检测是否需要验证码的URL
        self.user_check_url = 'https://login.taobao.com/member/request_nick_check.do?_input_charset=utf-8'
        # 验证淘宝用户名密码URL
        self.verify_password_url = "https://login.taobao.com/member/login.jhtml"
        # 访问st码URL
        self.vst_url = 'https://login.taobao.com/member/vst.htm?st={}'
        # 淘宝个人 主页
        self.my_taobao_url = 'https://i.taobao.com/my_taobao.htm'

        # 淘宝用户名
        self.username = "手机号"
        # 淘宝关键参数,包含用户浏览器等一些信息,很多地方会使用,从浏览器或抓包工具中复制,可重复使用
        self.ua = ""
        # 加密后的密码,从浏览器或抓包工具中复制,可重复使用
        self.TPL_password2 = ""

        # 请求超时时间
        self.timeout = 3

    def _user_check(self):
        """
        检测账号是否需要验证码
        :return:
        """
        data = {
            'username': self.username,
            'ua': self.ua
        }
        try:
            response = s.post(self.user_check_url, data=data, timeout=self.timeout)
            response.raise_for_status()
        except Exception as e:
            print('检测是否需要验证码请求失败,原因:')
            raise e
        needcode = response.json()['needcode']
        print('是否需要滑块验证:{}'.format(needcode))
        return needcode

    def _verify_password(self):
        """
        验证用户名密码,并获取st码申请URL
        :return: 验证成功返回st码申请地址
        """
        verify_password_headers = {
redirectURL=https%3A%2F%2Fi.taobao.com%2Fmy_taobao.htm%3Fspm%3Da2d00.7723416.754894437.1.61531fc917M0p9%26ad_id%3D%26am_id%3D%26cm_id%3D%26pm_id%3D1501036000a02c5c3739',
            # ':scheme': 'https',
            'accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3',
            'accept-encoding': 'gzip, deflate, br',
            'accept-language': 'zh-CN,zh;q=0.9',
            'cache-control': 'max-age=0',
            'content-length': '2858',
            'content-type': 'application/x-www-form-urlencoded',
            'sec-fetch-mode': 'navigate',
            'sec-fetch-site': 'same-origin',
            'sec-fetch-user': '?1',
            'Cache-Control': 'max-age=0',
            'Origin': 'https://login.taobao.com',
            'Upgrade-Insecure-Requests': '1',
            'User-Agent': '5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/78.0.3904.70 Safari/537.36',
            'Content-Type': 'application/x-www-form-urlencoded',
            'Referer': 'https://login.taobao.com/member/login.jhtml?redirectURL=https%3A%2F%2Fi.taobao.com%2Fmy_taobao.htm%3Fspm%3Da2d00.7723416.754894437.1.61531fc917M0p9%26ad_id%3D%26am_id%3D%26cm_id%3D%26pm_id%3D1501036000a02c5c3739',
        }
        # 登录toabao.com提交的数据,如果登录失败,可以从浏览器复制你的form data
        verify_password_data = {
            'TPL_username': self.username,
            'ncoToken': '1f1389fac2a670101d8a09de4c99795e8023b341',
            'slideCodeShow': 'false',
            'useMobile': 'false',
            'lang': 'zh_CN',
            'loginsite': 0,
            'newlogin': 0,
            'TPL_redirect_url': 'https://i.taobao.com/my_taobao.htm?spm=a2d00.7723416.754894437.1.61531fc917M0p9&ad_id=&am_id=&cm_id=&pm_id=1501036000a02c5c3739',
            'from': 'tb',
            'fc': 'default',
            'style': 'default',
            'keyLogin': 'false',
            'qrLogin': 'true',
            'newMini': 'false',
            'newMini2': 'false',
            'loginType': '3',
            'gvfdcname': '10',
            # 'gvfdcre': '68747470733A2F2F6C6F67696E2E74616F62616F2E636F6D2F6D656D6265722F6C6F676F75742E6A68746D6C3F73706D3D613231626F2E323031372E3735343839343433372E372E356166393131643970714B52693126663D746F70266F75743D7472756526726564697265637455524C3D68747470732533412532462532467777772E74616F62616F2E636F6D253246',
            'TPL_password_2': self.TPL_password2,
            'loginASR': '1',
            'loginASRSuc': '1',
            'oslanguage': 'zh-CN',
            'sr': '1920*1080',
            # 'osVer': 'macos|10.145',
            'naviVer': 'chrome|78.039047',
            'osACN': 'Mozilla',
            'osAV': '5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/78.0.3904.70 Safari/537.36',
            'osPF': 'Win32',
            'appkey': '00000000',
            'mobileLoginLink': 'https://login.taobao.com/member/login.jhtml?redirectURL=https://i.taobao.com/my_taobao.htm?spm=a2d00.7723416.754894437.1.61531fc917M0p9&ad_id=&am_id=&cm_id=&pm_id=1501036000a02c5c3739&useMobile=true',
            'showAssistantLink': 'false',
            'um_token': 'T274D86E0BEB4F2F2F527C889BADD92868CE10177BeFF895DE627CFE2D52A',
            'ua': self.ua
        }
        try:
            response = s.post(self.verify_password_url, headers=verify_password_headers, data=verify_password_data,
                              timeout=self.timeout)
            response.raise_for_status()
            # 从返回的页面中提取申请st码地址
        except Exception as e:
            print('验证用户名和密码请求失败,原因:')
            raise e
        # 提取申请st码url
        apply_st_url_match = re.search(r'<script src="(.*?)"></script>', response.text)
        # 存在则返回
        if apply_st_url_match:
            print('验证用户名密码成功,st码申请地址:{}'.format(apply_st_url_match.group(1)))
            return apply_st_url_match.group(1)
        else:
            raise RuntimeError('用户名密码验证失败!response:{}'.format(response.text))

    def _apply_st(self):
        """
        申请st码
        :return: st码
        """
        apply_st_url = self._verify_password()
        try:
            response = s.get(apply_st_url)
            # response.raise_for_status()
        except Exception as e:
            print('申请st码请求失败,原因:')
            raise e
        st_match = re.search(r'"data":{"st":"(.*?)"}', response.text)
        if st_match:
            print('获取st码成功,st码:{}'.format(st_match.group(1)))
            return st_match.group(1)
        else:
            raise RuntimeError('获取st码失败!response:{}'.format(response.text))
            # raise RuntimeError('获取st码失败!')

    def login(self):
        """
        使用st码登录
        :return:
        """
        # 加载cookies文件
        if self._load_cookies():
            return True
        # 判断是否需要滑块验证
        self._user_check()
        st = self._apply_st()
        headers = {
            'Host': 'login.taobao.com',
            'Connection': 'Keep-Alive',
            'User-Agent': '5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/78.0.3904.70 Safari/537.36'
        }
        try:
            response = s.get(self.vst_url.format(st), headers=headers)
            response.raise_for_status()
        except Exception as e:
            print('st码登录请求,原因:')
            raise e
        # 登录成功,提取跳转淘宝用户主页url
        my_taobao_match = re.search(r'top.location.href = "(.*?)"', response.text)
        if my_taobao_match:
            print('登录淘宝成功,跳转链接:{}'.format(my_taobao_match.group(1)))
            self._serialization_cookies()
            return True
        else:
            raise RuntimeError('登录失败!response:{}'.format(response.text))

    def _load_cookies(self):
        # 1、判断cookies序列化文件是否存在
        if not os.path.exists(COOKIES_FILE_PATH):
            return False
        # 2、加载cookies
        s.cookies = self._deserialization_cookies()
        # 3、判断cookies是否过期
        try:
            self.get_taobao_nick_name()
        except Exception as e:
            os.remove(COOKIES_FILE_PATH)
            print('cookies过期,删除cookies文件!')
            return False
        print('加载淘宝登录cookies成功!!!')
        return True

    def _serialization_cookies(self):
        """
        序列化cookies
        :return:
        """
        cookies_dict = requests.utils.dict_from_cookiejar(s.cookies)
        with open(COOKIES_FILE_PATH, 'w+', encoding='utf-8') as file:
            json.dump(cookies_dict, file)
            print('保存cookies文件成功!')

    def _deserialization_cookies(self):
        """
        反序列化cookies
        :return:
        """
        with open(COOKIES_FILE_PATH, 'r+', encoding='utf-8') as file:
            cookies_dict = json.load(file)
            cookies = requests.utils.cookiejar_from_dict(cookies_dict)
            return cookies

    def get_taobao_nick_name(self):
        """
        获取淘宝昵称
        :return: 淘宝昵称
        """
        headers = {
            'User-Agent': '5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/78.0.3904.70 Safari/537.36'
        }
        try:
            response = s.get(self.my_taobao_url, headers=headers)
            response.raise_for_status()
        except Exception as e:
            print('获取淘宝主页请求失败!原因:')
            raise e
        # 提取淘宝昵称
        nick_name_match = re.search(r'<input id="mtb-nickname" type="hidden" value="(.*?)"/>', response.text)
        if nick_name_match:
            print('登录淘宝成功,你的用户名是:{}'.format(nick_name_match.group(1)))
            return nick_name_match.group(1)
        else:
            raise RuntimeError('获取淘宝昵称失败!response:{}'.format(response.text))


if __name__ == '__main__':
    # 淘宝用户名
    username = '手机号'
    # 淘宝重要参数,从浏览器或抓包工具中复制,可重复使用
    ua = ''
    # 加密后的密码,从浏览器或抓包工具中复制,可重复使用
    TPL_password2 = ''
    ul = UsernameLogin(username, ua, TPL_password2)
    ul.login()

 

Guess you like

Origin blog.csdn.net/weixin_43407092/article/details/102975955