Crawler (Requests library get and post application)

Requests library

introduce

  • Requests is a library in Python for making HTTPS requests. It provides a simple and intuitive API for sending HTTP, HTTPS requests and handling responses.

request.get() function

parameter

  • url, generally place the URL that needs to be requested

  • headers, generally used for User-agent (UA) camouflage, to prevent the server from identifying machine requests, the headers can be obtained by right-clicking the browser to select check, then clicking the network, refreshing ( fn+ F5 ) , and then clicking a random data package , find the user-agent field

  • proxies, generally used for batch crawling, the purpose is to prevent the server from identifying frequent requests from the same machine, which leads to prohibiting the host from crawling

  • Cookies, carry cookies (dictionary form) when sending the request, which is convenient for the server to save user information. Right-click cookies in the browser to select check, then click the network , refresh ( fn+F5 ), and find a cookie field.
    insert image description here

  • parms , pass in other parameters, convenient and flexible to use

example

improt request
header = {
    
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.3'}
url = 'https://baidu.com'
proxies = {
    
    'http': 'http://10.0.0.1:8080', 'https': 'https://10.0.0.1:8080'}
response = request.get(url=url, headers=header)
print(response.text) # 显示的是获取到的html文档

request.get() returns the result

  • status_code, is the return status code, if it is 200, it proves that the request is successful
  • text, the returned html document element
  • content, because the request may not be a URL, if the URL is not a URL, the content will be returned to the content.
  • response.json() method, data=response.json(), data returns the data that processes the content as json type

Then request the data that can be parsed by Xpath, bs4 and other methods.

request.post() function

introduce

  • Mainly send a POST request to the specified url, send data, return the text/Response object of the response , and the return value is a response.Response object

parameter

  • url, the parameters to be sent
  • data, the data to be passed in, which can be a dictionary, a list of tuples, bytes, or a file object to be sent to the URL
  • json, the JSON object sent to the URL
  • cookies, similar to the above get method
  • proxies, similar to the above get method

example

import requests
url = 'https://www.begtut.com/try/python/demopage.php'
data= {
    
    'somekey': 'somevalue'}
response = requests.post(url, data = data)
print(response) 

Remark

You can see it combined with XPath application and so on. The link is below. Xpath introduction and syntax

Guess you like

Origin blog.csdn.net/xiaziqiqi/article/details/131348457