1. Cookie Introduction
HTTP protocol is stateless. Therefore, if the aid of other means, the remote server and the client can not know before what had been done communications. Cookie is one of "other means." Cookie A typical application scenario is used to record the user logged on the site.
- After the user logs in successfully, the next server sends a (usually encrypted) Cookie files.
- The client (usually a web browser) to save the received file Cookie up.
- The next time the client connects to the server to send the Cookie files to the server, the server check its meaning, restore logged (to avoid log in again).
2. requests使用cookie
When the browser as a client is connected to the remote server, the remote server will be required to produce a the SessionID, and attach to the browser in the Cookie. The next time, but as long as the connection of Cookie, the browser and the remote server, will use this SessionID; and the browser will automatically collaborate with the server to maintain the appropriate Cookie.
In the requests
middle, as well. We can create a requests.Session
, Cookie later in the Session in communication with the remote server, which generated requests
automatically safeguard for us.
3. POST form
post method may be a set of user data, sent to the remote server in the form of the form. After receiving the remote server, the corresponding movement in accordance with the contents of the form.
Invoked requests
when the POST method can be used data
to receive one structural parameter Python dictionary. requests
Python dictionary automatically serialized to the actual contents of the form. E.g:
import requests cs_url = 'http://httpbin.org/post' my_data = { 'key1' : 'value1', 'key2' : 'value2' } r = requests.post (cs_url, data = my_data) print r.content
4. Try realistic simulation log on GitHub
The first step in the simulation log, first we have to figure out what the browser logins occurred.
GitHub login page is https://github.com/login . We first clear your browser's Cookie records, and then open the login page with Chrome. After fill in the Username and Password, we open Tamper Chrome and Chrome elements of the review tool (found Network tab), then sign on the button.
In Tamper Chrome, we found that: Although the login page is https://github.com/login , but actually receiving the form is https://github.com/session . If the login is successful, then jump to https://github.com/ home, return status code 200
.
In Chrome's Inspect Element window, we can see to submit session
the form interface. Inside contains
commit utf8 authenticity_token login password
Among them, commit
and utf8
two is constant; login
and password
are the user name and password, which is well understood. The exception authenticity_token
is a long list of irregular character, we do not know what it is.
POST action takes place in the session
prior interactive interface, and therefore a possible source of information only login
interface. We open source login page, try searching for authenticity_token
it is not difficult to find the following:
<input name="authenticity_token" type="hidden" value="......" />
It turned out that the so-called authenticity_token
is to understand written in HTML pages, only with a hidden
pattern hidden. To this end, we only need to use regular Python library to resolve it, fine.
import requests import re login_url = 'https://github.com/login' user = 'user' //具体账号 password = 'password' //具体密码 user_headers = { 'User-Agent' : 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/48.0.2564.116 Safari/537.36', 'Accept' : 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8', 'Accept-Encoding' : 'gzip', 'Accept-Language' : 'zh-CN,zh;q=0.8,en;q=0.6,zh-TW;q=0.4' } session = requests.Session() response = session.get(login_url, headers = user_headers) pattern = re.compile(r'<input name="authenticity_token" type="hidden" value="(.*)" />') authenticity_token = pattern.findall(response.content)[0] login_data = { 'commit' : 'Sign in', 'utf8' : '%E2%9C%93', 'authenticity_token' : authenticity_token,'login' : user, 'password' : password } session_url = 'https://github.com/session' response = session.post(session_url, headers = user_headers, data = login_data)
1. First of all, we are ready and Chrome consistent HTTP request header information. Specifically, which User-Agent
is more important.
2. Communication modeled on the browser and the server, we created a requests.Session
.
3. We open the login page using the GET method, and resolve to use regular library authenticity_token
.
4. The required data, preparation into a Python dictionary login_data
5. Finally, the POST method, the form is submitted to session
the interface.
6. The end result through the 302
jump, open ( 200
) GitHub page.