Crawler request module

Version 1

 python2 : urllib  urllib2

python3: urllib and urllib2 the merger, urllib.request

 

2 common method

2.1urllib.request.urlopen ( "URL") to initiate a request to the site and get a response

     2.1.1 byte stream = response.read ()

     字符串 = response.read().decode("utf-8")31

     encode (): String ---> bytes

     decode (): bytes -> String

2.2 User-Agent reconstruction

   2.2.1 does not support the reconstruction of User-Agent: urlopen ()

  2.2.2 supports the reconstruction of User-Agent: urllib.request.Request ( "URL", headers = "Dictionary")

     User-Agent is the first step in anti-reptile reptiles and struggle, we must send a request with a User-Agent 

     2.2.2.1 Use Process

       2.2.2.1.1 using the method for constructing the request object Request

       2.2.2.1.2 using the urlopen () Gets the response object

      2.2.2.1.3 by response object read (). Decode ( "utf-8") acquire the content

  The method of the response object response 2.2.2.2

     2.2.2.2.1 read () reads the content server response

     2.2.2.2.2 getcode()

         Returned HTTP response effect print (response.getcode ())

              200 success

              4xx server error page 5xx server error

     2.2.2.2.3 geturl()

          Role return url actual data (to prevent redirection problem)

3 urllib.parse module

   3.1 urlencode (dictionary)

        urlencode({"wd":"美女"})       wd=%e%rr.......

  3.2 quote (String)

    

= aseurl "http://www.baidu.com/s?wd="
Key = the INPUT ( "Please enter the content to be searched:")
# with quote () coding

key = urllib.parse.quote(key)

url + = baseurl key
print (url)

 

Requesting - Response (html source) - Analysis

 

Published 267 original articles · won praise 36 · views 190 000 +

Guess you like

Origin blog.csdn.net/qq_40270754/article/details/100310728