Use account password to simulate login to Sina Weibo

    A few months ago, I studied the use of account passwords to simulate logging in to Sina Weibo. After a long time, there was no result. Finally, the rogue used cooike to climb to the data. In the past few days, because I want to do something, I have to log in with my account and password, so I researched it again, and the result came out quickly. Thinking about it, I didn't get it out for so long. Did I improve my programming level in the past few months? ? ? -_-||

1. Tools

chrome+fiddler.
Note:
1. I originally wanted to use httpfox provided by Firefox, but the Firefox browser was suddenly upgraded a few days ago, and then I couldn't find this tool.
2. When using fiddler, if the data in chrome cannot be captured, the reason may be that you have used some plug-ins so that fiddler cannot capture the data, and you need to close the plug-in.
3. Now Sina is using https connections, so you need to set up fetching https in fiddler. I originally wanted to use Firefox, but it was upgraded a few days ago, which caused the certificate imported by fiddler to be insecure, and I couldn't access Sina's website at all. Really r dog.
Anyway, use chrome.

Second, Sina certification process

     One of the advantages of using fiddler is that the information captured is very complete, at least it feels more complete than httpfox. This time, I used fiddler to take a good look at the Sina certification process, and summarize it below.
     Sina has many login portals, and Sina uses the SSO (Single Sign-On) method for authentication, that is, after logging in at a login node, you do not need to log in when accessing other network services. So if you want to simulate logging in to Weibo, you don't need to simulate logging in to weibo.com directly, because weibo.com may also ask you to enter the verification code for login. particularly troublesome. Another login URL is selected here https://login.sina.com.cn/signup/signin.php , you don’t need to enter the verification code to log in through this URL, but you are not logging in to Weibo at this time, you are authenticated at the URL above After success, use the obtained cookie to go to weibo.com or weibo.cn for authentication, and finally obtain the cookie of weibo.com or weibo.cn, and then crawling data can be carried out through the cookie, which is the blog I wrote before .
    Go back to the previous login process, open fiddler first, and then open the URL https://login.sina.com.cn/signup/signin.php . write picture description here
Then enter the username and password. Click to Login. On fiddler, you can see that many URLs are requested.
write picture description here
The more important URLs here are:
1,login.sina.com.cn/sso/prelogin.php?entry=account&callback=sinaSSOController.preloginCallBack&su=MTU4NTAzMTg0MDc%3D&rsakt=mod&client=ssologin.js(v1.4.15)&_=1512008557384
2,login.sina.com.cn/sso/login.php?client=ssologin.js(v1.4.15)&_=1512008557445
3,passport.weibo.com/wbsso/login?ticket=ST-NjM3MjA0MTM1NA%3D%3D-1512008674-gz-AD5F649D2BAA76814D96D3CB494D033B-1&ssosavestate=1543544674&callback=sinaSSOController.doCrossDomainCallBack&scriptId=ssoscript0&client=ssologin.js(v1.4.19)&_=1512008557867
4,passport.weibo.cn/sso/crossdomain?action=login&savestate=1&callback=sinaSSOController.doCrossDomainCallBack&scriptId=ssoscript3&client=ssologin.js(v1.4.19)&_=1512008557868
如图所示:
write picture description here
Among them, 1 corresponds to Sina's pre-login, 2 means Sina's official login, and 3 means obtaining cross-domain authentication of weibo.com. 4 means cross-domain authentication to weibo.cn. And all we need is the cookie returned by 3,4.

2.1 Pre-login

When you click on the url with prelogin in 1 in fiddler , you can see the parameters of the request and the parameters returned by the request. The parameters of the request are as follows:
write picture description here
According to the exploration of other bloggers, you can know that the encrypted username and encryption algorithm represented by su as base64. _ is the timestamp, other values ​​are fixed.
The returned results are as follows:
write picture description here
the more important ones are the nonce, pubkey, rsakv, and servertime parameters. It will be used in the next step of encrypting the password.

2.2 Login

After the pre-login in 1 is completed, the url of 2 will continue to be requested. When requesting the url in 2 , it is post, and the data of the post is as follows:
write picture description here
you can see that the nonce, pubkey, rsakv, and servertime parameters returned in the pre-login are all added to the data of the post, where sp represents encryption The latter password uses ras encryption. According to the research of other bloggers, the encrypted public key is the "10001" module specified in the pubkey+js file in the previous step. Then add the servertime and nonce parameters obtained in the previous step. I don't know much about encryption, so I just call the encryption part of the ssologin.js file to encrypt the password in the program. The other parameters of post are currently fixed, so they can be filled in directly.
When the user's password is correct, the return value is as follows:
write picture description here
You can see that the returned crossDomainUrlList contains the credentials for cross-domain access to Sina services such as weibo.com, 97973, and weibo.cn.

2.3 Get cookies

In the previous step, we returned the login addresses of weibo.com and weibo.cn. What we do now is to visit these urls and bring the cookie obtained in the second part to successfully obtain weibo.com, or weibo. cn cookies. Finally, you can crawl the page data with the obtained cookie.
    Sad news: The blogger found that he could not log in to weibo.cn when he used the program to simulate login today, but he could log in to weibo.com. Every time I log in to weibo.cn, I am redirected to the login interface, but the result displayed by the program is indeed to obtain the cookie of weibo.cn. so sad. Can someone who has studied logging in to weibo.cn talk to me about QAQ 2017-12-05

    Personally recommend to crawl to the data of weibo.cn, because in weibo.com, it involves the step-by-step loading of the page, and the page returns unicode and other issues, and the operation is cumbersome. So crawling to the data of weibo.cn should be simple.

Three, code implementation

     The process of Weibo login is roughly like this, but Sina will change the login logic from time to time, but the changes should not be big, so when using it, be sure to follow the above steps to test it yourself to see if any parameters have changed. The code has been uploaded: http://download.csdn.net/download/tyoukai_/10139542

Reference:
http://www.csuldw.com/2016/11/10/2016-11-10-simulate-sina-login/?utm_source=tuicool&utm_medium=referral
http://blog.csdn.net/fly_leopard/article/ details/51082531

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=325405197&siteId=291194637