Reptile Python Tutorial: requests simulated landing github

1. Cookie Introduction

HTTP protocol is stateless. Therefore, if the aid of other means, the remote server and the client can not know before what had been done communications. Cookie is one of "other means." Cookie A typical application scenario is used to record the user logged on the site.

 

 

  1. After the user logs in successfully, the next server sends a (usually encrypted) Cookie files.
  2. The client (usually a web browser) to save the received file Cookie up.
  3. The next time the client connects to the server to send the Cookie files to the server, the server check its meaning, restore logged (to avoid log in again).

2. requests使用cookie

When the browser as a client is connected to the remote server, the remote server will be required to produce a the SessionID, and attach to the browser in the Cookie. The next time, but as long as the connection of Cookie, the browser and the remote server, will use this SessionID; and the browser will automatically collaborate with the server to maintain the appropriate Cookie.

In the   requests middle, as well. We can create a   requests.Session , Cookie later in the Session in communication with the remote server, which generated  requests automatically safeguard for us.

3. POST form

post method may be a set of user data, sent to the remote server in the form of the form. After receiving the remote server, the corresponding movement in accordance with the contents of the form.

Invoked  requests when the POST method can be used   data to receive one structural parameter Python dictionary. requestsPython dictionary automatically serialized to the actual contents of the form. E.g:

import requests

cs_url    = 'http://httpbin.org/post'
my_data   = {
    'key1' : 'value1',
    'key2' : 'value2'
}

r = requests.post (cs_url, data = my_data)
print r.content

4. Try realistic simulation log on GitHub

The first step in the simulation log, first we have to figure out what the browser logins occurred.

GitHub login page is  https://github.com/login  . We first clear your browser's Cookie records, and then open the login page with Chrome. After fill in the Username and Password, we open Tamper Chrome and Chrome elements of the review tool (found Network tab), then sign on the button.

In Tamper Chrome, we found that: Although the login page is  https://github.com/login  , but actually receiving the form is   https://github.com/session  . If the login is successful, then jump to   https://github.com/  home, return status code   200 .

In Chrome's Inspect Element window, we can see to submit   session the form interface. Inside contains

commit
utf8
authenticity_token
login
password

Among them,  commit and   utf8 two is constant;  login and   password are the user name and password, which is well understood. The exception   authenticity_token is a long list of irregular character, we do not know what it is.

POST action takes place in the  session prior interactive interface, and therefore a possible source of information only   login interface. We open source login page, try searching for   authenticity_token it is not difficult to find the following:

<input name="authenticity_token" type="hidden" value="......" />

It turned out that the so-called  authenticity_token is to understand written in HTML pages, only with a   hidden pattern hidden. To this end, we only need to use regular Python library to resolve it, fine.

import requests
import re

login_url  = 'https://github.com/login'
user = 'user'  //具体账号
password  = 'password'   //具体密码
user_headers = {
    'User-Agent' : 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/48.0.2564.116 Safari/537.36',
    'Accept' : 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8',
    'Accept-Encoding' : 'gzip',
    'Accept-Language' : 'zh-CN,zh;q=0.8,en;q=0.6,zh-TW;q=0.4'
}

session  = requests.Session()
response = session.get(login_url, headers = user_headers)
pattern = re.compile(r'<input name="authenticity_token" type="hidden" value="(.*)" />')

authenticity_token = pattern.findall(response.content)[0]

login_data = {    
    'commit' : 'Sign in',    
    'utf8' : '%E2%9C%93',    
    'authenticity_token' : authenticity_token,'login' : user,    
    'password' : password
}

session_url  = 'https://github.com/session'
response = session.post(session_url, headers = user_headers, data = login_data)

1. First of all, we are ready and Chrome consistent HTTP request header information. Specifically, which  User-Agent is more important.

2. Communication modeled on the browser and the server, we created a  requests.Session .

3. We open the login page using the GET method, and resolve to use regular library  authenticity_token .

4. The required data, preparation into a Python dictionary login_data

5. Finally, the POST method, the form is submitted to  session the interface.

6. The end result through the  302 jump, open (  200 ) GitHub page.

Guess you like

Origin www.cnblogs.com/7758520lzy/p/12102734.html