python basic programming:

This article describes the python requests simulated landing github implementation methods, the paper sample code described in great detail, has a certain reference value of learning for all of us to learn or work, we need friends with Xiao Bian below to learn learn together it

  1. Cookie Introduction

HTTP protocol is stateless. Therefore, if the aid of other means, the remote server and the client can not know before what had been done communications. Cookie is one of "other means." Cookie A typical application scenario is used to record the user logged on the site.

After the user logs in successfully, the next server sends a (usually encrypted) Cookie files.
The client (usually a web browser) to save the received file Cookie up.
The next time the client connects to the server to send the Cookie files to the server, the server check its meaning, restore logged (to avoid log in again).
2.requests use cookie

When the browser as a client is connected to the remote server, the remote server will be required to produce a the SessionID, and attach to the browser in the Cookie. The next time, but as long as the connection of Cookie, the browser and the remote server, will use this SessionID; and the browser will automatically collaborate with the server to maintain the appropriate Cookie.

In the requests, as well. We can create a requests.Session, and later to communicate with the remote server in the Session, Cookie generated therein, requests will automatically maintain good for us.

  1. POST form

post method may be a set of user data, sent to the remote server in the form of the form. After receiving the remote server, the corresponding movement in accordance with the contents of the form.

When the POST method invocation requests can be received with a Python dictionary data structure parameters. Python dictionary requests will automatically be serialized as the actual contents of the form. E.g:

import requests
 
cs_url  = 'http://httpbin.org/post'
my_data  = {
  'key1' : 'value1',
  'key2' : 'value2'
}
 
r = requests.post (cs_url, data = my_data)
print r.content
  1. The actual analog try to log GitHub

The first step in the simulation log, first we have to figure out what the browser logins occurred.

GitHub login page is https://github.com/login. We first clear your browser's Cookie records, and then open the login page with Chrome. After fill in the Username and Password, we open Tamper Chrome and Chrome elements of the review tool (found Network tab), then sign on the button.

In Tamper Chrome, we found that: Although the login page is https://github.com/login, but actually receiving the form is https://github.com/session. If the login is successful, then jump to https://github.com/ home, return status code 200. Here Insert Picture Description
In Chrome's Inspect Element window, we can see that form information submitted to the session interface. Inside contains

the commit
utf8
authenticity_token
the Login
password Here Insert Picture Description
which, commit and two utf8 is constant; login and password are the user name and password, which is well understood. Authenticity_token alone is a long list of irregular character, we do not know what it is.

POST action occurs before an interface to interact with the session, and therefore a possible source of information only login interface. We open source login page, try searching authenticity_token is not difficult to find the following:

<input name="authenticity_token" type="hidden" value="......" />

It turned out that the so-called authenticity_token understand is written in HTML pages, only hidden by the hidden mode. To this end, we only need to use regular Python library analytical about, like

import requests
import re
 
login_url = 'https://github.com/login'
user = 'user' //具体账号
password = 'password'  //具体密码
user_headers = {
  'User-Agent' : 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/48.0.2564.116 Safari/537.36',
  'Accept' : 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8',
  'Accept-Encoding' : 'gzip',
  'Accept-Language' : 'zh-CN,zh;q=0.8,en;q=0.6,zh-TW;q=0.4'
}
 
session = requests.Session()
response = session.get(login_url, headers = user_headers)
pattern = re.compile(r'<input name="authenticity_token" type="hidden" value="(.*)" />')
 
authenticity_token = pattern.findall(response.content)[0]
 
login_data = {  
  'commit' : 'Sign in',  
  'utf8' : '%E2%9C%93',  
  'authenticity_token' : authenticity_token,'login' : user,  
  'password' : password
}
 
session_url = 'https://github.com/session'
response = session.post(session_url, headers = user_headers, data = login_data)
  1. First of all, we are ready and Chrome consistent HTTP request header information. Specifically, one of the User-Agent is more important.

  2. Modeled on the browser and server communications, we created a requests.Session.

  3. We open the login page using the GET method, and resolve to authenticity_token with a regular library.

  4. The required data, preparation into a Python dictionary login_data

  5. Finally, using the POST method, the form is submitted to the session interface.

  6. The end result through 302 jumps, opened (200) GitHub page.

Content on more than how many, and finally to recommend a good reputation in the number of public institutions [programmers], there are a lot of old-timers learning skills, learning experience, interview skills, workplace experience and other share, the more we carefully prepared the zero-based introductory information on actual project data every day to explain the timing of Python programmers technology, and share some learning methods need to pay attention to small detailsHere Insert Picture Description

Published 20 original articles · won praise 0 · Views 3607

Guess you like

Origin blog.csdn.net/chengxun02/article/details/104999037