python-microblogging simulation login

Compared with the previous Zhihu simulated login, the simulated login of Weibo in this article is more complicated.

Turn on the developer mode of firefox and clear the cookies of the relevant website, in case the corresponding HTTP interaction cannot be observed because some important files have been cached. Log in to Weibo and observe the HTTP interaction status. The following will analyze the process:

Step 1: Pre-
login Many websites now perform pre-login, which encodes or encrypts the entered user name.
write picture description here
write picture description here
Corresponding to the request in response to:
sinaSSOController.preloginCallBack ({ "retcode": 0, "ServerTime": 1,504,752,958, "PCID": "GZ-09567206091c76921bb8a6a83f424755d99a", "the nonce": "1468SE", "pubkey": "EB2A38568661887FA180BDDB5CABD5F21C7BFD59C090CB2D245A87AC253062882729293E5506350508E7F9AA3BB77F4333231490F915F6D63C55FE2F08A49B353F444AD3993CACC02DB784ABBB8E42A9B1BBFFFB38BE18D78E87A0E41B9B8F73A928EE0CCEE1F6739884B9777E4FE9E88A1BBE495927AC4A799B3181D6442443", "rsakv":"1330428213","is_openlock":0,"showpin":0,"exectime":12})

By observing the parameters, guess that su is the processed username, which is a long string of numbers, most likely a timestamp. After many times of simulated login, it is found that only the parameters will change. As for the data returned by the response, I don't know what effect it has at present, and we will analyze it later.

The second step of login:
Looking at the HTTP interaction in order, we found that an HTTP request is very likely to be a login request:
write picture description here
write picture description here

Through its request URL and post so much data, the login request must be it. By analyzing the parameters of the request, it is found that the servertime, nonce, rsakv, etc. returned by the previous pre-login are all in it. Then we need to simulate the request multiple times to find out which of the parameters are constant and which are variable.

The final conclusion: sp, su, servertime, nonce, raskv are changed, and servertime, nonce, raskv, etc. can be found in the response of pre-login, so the key is to find sp, su.

In addition, an important information can be obtained through the parameters, that is, the request calls the ssologin.js file, which can be said to play a crucial role.

Check out the http request to sslogin.js:
write picture description here

By clicking https://i.sso.sina.com.cn/js/ssologin.js , we can view the content of the file, which is a series of encrypted files for encrypting user names and passwords.

Find username in ssologin.js, and find the line by analyzing the matching code:
username = sinaSSOEncoder.base64.encode(urlencode(username));
so guess Weibo's processing of username is advanced url encoding and then proceeding base64 encoding. Verified by python coding, it is indeed the case.

Next, look for password in the file, and analyze the matching code to find this line:
password = RSAKey.encrypt([me.servertime, me.nonce].join("\t") + "\n" + password)
guess the code is to encrypt password, rsa is currently the most widely used public key cryptosystem, basically mainstream languages ​​have rsa implementation class library. The most important thing in public key cryptosystem is the public key and the private key, we need the public key to encrypt. So what is the public key?

Found another code around this code:
RSAKey.setPublic(me.rsaPubkey, "10001");
This code should be to set the public key, remember the pubkey returned during pre-login, that should be rsaPubkey. Then even if you know all these parameters and the same processing process, you can't get the same result, because rsa adds a random factor to its implementation in order to increase the difficulty of deciphering, so it can only be verified by judging whether the login is successful. The birth of rsa can be said to be a subversion of cryptography, and interested friends can read related papers.

There are two solutions when simulating user name and password encryption: 1. Use python to implement its key code 2. Directly execute js code through PyexecJS

The third step of redirection:
Generally speaking, the second step is completed, and the HTML code of the home page is returned, but Weibo is a bit different. It performs a series of redirection before jumping to the home page.

Let's take a look at the corresponding http response after a successful login request:
write picture description here

We can see that it is a piece of html code, where the location.replace function jumps. Then we look for the request whose request address is this parameter, and the response is as follows:
write picture description here

We found that the html code in the response will still automatically jump, and continue to find the http request of the address, the results are as follows:
write picture description here

Its status code is 302, it is redirected, and observe the redirected http request:
write picture description here
its response is:
parent.sinaSSOController.feedBackUrlCallBack({"result":true,"userinfo":{"uniqueid":"1797442073"," userid”:null,”displayname”:null,”userdomain”:”?wvr=5&lf=reg”}});

I don't know what it is for now. We continue to look down the http requests in order and find that the http requests are as follows:
write picture description here

The parameter of the http request is exactly the previous userdomain, and the status code of the http request is also 302, so I check the redirected http request:
write picture description here

The response of the http request is the html code of the page after logging in. At this point, the simulated login of Weibo is successful.

There are three verification code processing solutions that may be involved in login: 1. It is to spend money. Now there are many coding platforms, such as cloud coding, which all provide corresponding APIs to integrate their functions. 2. Use deep learning by yourself Methods to train classifiers, such as convolutional neural networks 3. Save the verification code, and then show it to the user, requiring the user to manually enter it.

The content of this article is based on http://www.jianshu.com/p/816594c83c74 . In fact, the code implementation is not difficult, but the analysis of the login protocol is the core.

The implementation code of this article: https://github.com/EdwardLee0331/weiboLogin

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=325405449&siteId=291194637