One day reptiles open class learning

Learning Link http://stu.ityxb.com/openCourses/detail/238

 

What are reptiles:

  Web crawler is to simulate a network browser sends a request to accept a request response information automatically crawl the Internet according to certain rules

Reptile uses:

  Data Acquisition (Baidu news, headlines today), 12306 grab votes, the network automatically vote,

Debugging tools:    

  Fn + F12  

      

 

 

 

 

 

Browser request process:

    

 

 

 

 

 URL rules

    

 

http request

      

 

 

 

 

 http request is an important part of

  Request URL, the request mode (post, GET), the request header, the request body

http response format

  

 

 

 http an important part of the response

  Response status code: 404, 500, 200 (successful)

   Responsive head,

   Response body (html content)

Ruquests module

  Python is a module that can simulate the browser sends the request acquisition response

Learning materials:

http://cn.python-requests.org/zh_CN/latest/

 

 

 

 

 installation

pip install requests

 

Website crawling steps:

Step one: analysis

  Request url, mode request, the first request, request parameters

 

Step II: Analog browser sends a request acquisition response

'' ' 
URL https://www.baidu.com/baidu?wd=%E7%9F%B3%E5%AE%B6%E5%BA%84%E5%AD%A6%E9%99%A2 
request method get 
request header User-Agent: Mozilla / 5.0 ( Windows NT 10.0; Win64; x64; rv: 74.0) Gecko / 20100101 Firefox / 74.0 
request parameter wd =% E7% 9F% B3 % E5% AE% B6% E5% BA? the AD E5% 84%%%% 99% E9% A6 A2 
'' ' 
# 1. import module 
import requests
 # 2. analog transmission request acquirer 

response = requests.get ( 
    URL = " https://www.baidu.com / baidu / S " , 
    headers = {
          " the User-- Agent " : "Mozilla / 5.0 (the Windows NT 10.0; Win64; x64-; RV: 74.0) the Gecko / Firefox 20,100,101 / 74.0 " , 
    } 

) 

# 3. The results of the content of the response processing 
with Open ( ' fetch response content .html ' , ' W ' , = encoding ' UTF8 ' ) AS F: 
    f.write (response.text)  

 Implement a custom request parameter

 

 

 

 

 

 

Guess you like

Origin www.cnblogs.com/xingyuner/p/12547596.html
Recommended