Examples of learning - crawling "broken sky" novel full text

 Read premise: python basic grammar

                   Regular Expressions

 Development Environment: (Windows) eclipse + pydev

 Crawling URL: www.doupoxs.com/doupocangqiong/

Import Requests
 Import Re
 Import Time 

headers = { ' the User-- Agent ' : ' the Mozilla / 5.0 (the Windows NT 10.0; Win64; x64-) AppleWebKit / 537.36 (KHTML, like the Gecko) the Chrome / 75.0.3770.142 Safari / 537.36 ' }
      # join request first, to increase the stability of the crawler 
F = open ( ' D: \ Pyproject \ doupo \ doupo.txt ' , ' a + ' )       # Create txt file, open for appending 

DEF the get_info (URL):                  # text of each page crawled function 
    RES = requests.get (URL, headers = headers)
     IF== 200 res.status_code:                      # determines whether the request code 200, if it is successful, is not, then the failure 
        Contents the re.findall = ( ' .? <P> (*) </ P> ' , res.content.decode ( ' UTF-. 8 ' ), re.S)         # define encoding 
        for Content in Contents: 
            f.write (Content + ' \ n- ' )                   # regular txt file acquired data is written 
    the else :
         Pass 
    
IF  the __name__ == ' __main__ ' :       
    URLs = [ 'http://www.doupoxs.com/doupocangqiong/{}.html ' .format (STR (I)) for I in Range (2,1665)]   # Total crawling pages 
    for URL in URLs: 
        the get_info (URL)                           
        the time.sleep ( 1 ) 
f.close ()                                          # close the document

The results show:

Obtaining such as head about the request, see my other post, not repeat them: https://www.cnblogs.com/junecode/p/11306266.html

Guess you like

Origin www.cnblogs.com/junecode/p/11330183.html