Process simulation code + cookie Login

One, background

Related Bowen: https: //www.jianshu.com/p/9fce799edf1e

https://blog.csdn.net/h19910518/article/details/79348051

Cookie

It HTTP protocol is stateless, meaning that this request and the last request time is no relationship, no associated. This benefit is stateless fast. But sometimes we want to have associated with several requests a page, for example: You have to log in a, b hope is landing in the state, but this is two different pages, that is, two different HTTP request, this two HTTP requests are stateless, which is unrelated, it can not simply read it in a to b, has landed, you can use the database records logged, but the server will create pressure.

Cookie refer to certain websites in order to identify the user's identity, a Session tracking and data on the user's local terminal storage. When you visit a website on your machine in a small text file storage site, it records your user ID, password, pages viewed, time spent and other information, when you came to the site again, cookie sent with each request to the same server, the server reads the Cookie, to know your information, we can make the appropriate action.

Session

Session: In the computer, especially in network applications, known as "session control." Session object required to store user-specific configuration information and session property

1.客户端发送一个 带有Set-Cookie 属性的请求；
2.这个请求需要由服务端用session加密算法进行加密，得到一个session_id 和 cookie 的对应字典
3.下次客户端登录时，浏览器会发送带有Cookie头部的请求的时候，用户就可以不用登陆了。

Stored in the variable Session object is not lost, but always exist throughout the Session. When a user requests a web page from the application, if the user has not Session, the Web server automatically creates a Session object. And when Session expired or abandoned, the server will terminate the Session.

Second, prepare

1. manually log all network

Check the verification code type
Use fiddler fetch data (request url, cookie data)

2. Coding cloud platform

Registered account (users and developers)
See the developer documentation
Download DLL

Third, the main steps

Use get method requests, access CAPTCHA image, save it to local
The local image upload code to identify coding cloud
The recognition results with other data (account number, password, etc., by crawling fiddler) encapsulated into the data argument
Examples of a Session object, using the post method, and data submitted url parameters, and login.

Code

import http.client, mimetypes, urllib, json, time, requests
from lxml import etree
from YDMHTTPDemo3.x import YDMHttp #将下载的DLL导入

#给云打码定义一个函数
def getVCode( username, password,filename,codeType): 
    appid = 'xxxx'
    appkey = '3b753c7c24fba02dexxxxxxxxxxxxxxx'
    filename = filename
    codeType = codeType
    timeout = 30
    if (username == 'username'):
        print('请设置好相关参数再测试')
    else:
        yundama = YDMHttp(username, password, appid, appkey)  #实现云打码用户登录
        uid = yundama.login();
        print('uid: %s' % uid)
        balance = yundama.balance();
        print('balance: %s' % balance)
        cid, result = yundama.decode(filename, codetype, timeout); #验证码图片上传，返回结果
        print('cid: %s, result: %s' % (cid, result))

target1_url = "http://www.renren.com/"
headers = headers = {
    'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/76.0.3809.132 Safari/537.36',
    }
response = requests.get(url = target1_url,headers = headers)
ht = response.text
tree = etree.HTML(ht)
img= tree.xpath('//img[@verifyPic_login]/@src')
#data中的参数通过手动登录时，使用fiddler抓取。
data = {             
        'captcha_type':'web_login',
        'domain':'renren.com',
        'email':'[email protected]',  #邮箱
        'f':'',
        'icode':"",   #验证码
        'key_id':'1',
        'origURL':'http://www.renren.com/home',
        'password':'06735438342bxxxxxxxxxxxxxxxxxxxxxxxxx', #加密后的密码
        'rkey':'8a339012c2e46e9xxxxxxxxxxxxxxxxxx',
    }
target2_url = 'http://www.renren.com/ajaxLogin/login?1=1&uniqueTimestamp=2019841747473'
if img:  #如果有验证码
    urllib.request.urlretrieve(img[0],'./getimage.jpg')
    VCode = getVCode( 'Sroxi', 'xxx', './getimage.jpg', '1006')
    print(VCode)
    data['icode'] = VCode
    
session = requests.Session()
session.post(url=target2_url,data = data,headers = headers)
target3_url = 'http://www.renren.com/58xxxxxx'    
response1 = session.get(url = target3_url,headers = headers)
htmlfile = response1.text
with open('renren.html','w',encoding = 'utf8') as f:
    f.write(htmlfile)
print('finish')