[Notes] crawler Python a user to access the processing analog form (3)

Learning textbooks "python network data collection", most of the code for this book.

  Most are made of some HTML web form fields, a submit button, a jump after the processed form of "execution result" (the value of the form attribute action) page configuration. Although these fields are usually composed of HTML text, but can also file uploads or other non-text content. These are so fetch hinder data on the front. Ado open out.

  1.HTTP basic access authentication

Before cookie invention, website login process most commonly used method is to use HTTP basic access authentication (HTTP basicaccess authentication)

import requests
from requests.auth import AuthBase
from requests.auth import HTTPBasicAuth
auth
= HTTPBasicAuth('ryan', 'password') r = requests.post(url="http://pythonscraping.com/pages/auth/login.php", auth= auth) print(r.text)

Although it looked like a normal POST request, but there is a request object to the HTTPBasicAuth auth as a parameter. The results will be displayed in the user name and password authentication success page (if validation fails, is refused access to a page).

  2. General form processing

Form Source:

<form method="post" action="processing.php">
  First name: <input type="text" name="firstname"><br>
  Last name: <input type="text" name="lastname"><br>
  <input type="submit" value="Submit">
</form>

Note: First, the names of the two input fields is firstname and lastname, it is very important. Determine the name of the field after the form has been confirmed to be transferred to the variable name on the server. If you want to simulate the behavior of form submission data, you will need to ensure that your variable name with the field name is one to one. Second, pay attention to a form of behavior actually occur in real processing.php (absolute path is http://pythonscraping.com/files/processing.php). Any POST request form actually occurring on this page, the page is not the form itself is located.

python statement:

import requests
params = {'firstname': 'Ryan', 'lastname': 'Mitchell'}
r = requests.post("http://pythonscraping.com/files/processing.php", data=params)
print(r.text)

  3. See the little more code

You do not need to understand the meaning of each code, as long as you need to know that information can (sentence EDITORIAL)

html:

<form action="http://post.oreilly.com/client/o/oreilly/forms/quicksignup.cgi" id="example_form2" method="POST">
  <input name="client_token" type="hidden" value="oreilly" />
  <input name="subscribe" type="hidden" value="optin" />
  <input name="success_url" type="hidden" value="http://oreilly.com/store/newsletter-thankyou.html" />
  <input name="error_url" type="hidden" value="http://oreilly.com/store/newsletter-signup-error.html" />
  <input name="topic_or_dod" type="hidden" value="1" />
  <input name="source" type="hidden" value="orm-home-t1-dotd" />
  <fieldset>
    <input class="email_address long" maxlength="200" name="email_addr" size="25" type="text" 
        value
="Enter your email here" />     <button alt="Join" class="skinny" name="submit" onclick="return addClickTracking('orm','ebook','rightrail','dod');
        "
value="submit">Join</button>   </fieldset> </form>

Although the first look at these will feel terror, but in most cases (later we will introduce an exception) you only need to focus on two things:
  • you want to submit the data field name (in this case email_addr)
  • Forms of action property, which is page after form submission sites will be displayed (in this case http://post.oreilly.com/client/o/oreilly/forms/quicksignup.cgi) added to the corresponding information request information, run the code:

import requests
params
= {'email_addr': '[email protected]'} r = requests.post("http://post.oreilly.com/client/o/oreilly/forms/ quicksignup.cgi", data=params) print(r.text)

  4. submission of documents and images

html:

<h2>Upload a file!</h2>
<form action="../pages/files/processing2.php" method="post" enctype="multipart/form-data">
  Submit a jpg, png, or gif: <input type="file" name="uploadFile"><br>
  <input type="submit" value="Upload File">
</form>

python:

import requests
files = {'uploadFile': open('../files/Python-logo.png', 'rb')}
r = requests.post("http://pythonscraping.com/pages/processing2.php",
files=files)
print(r.text)

 

Guess you like

Origin www.cnblogs.com/dfy-blog/p/11519423.html