Python3 simple reptiles crawl Web Images

Now there are many examples of online python2 write reptile crawled pages pictures, but not to the novice (novice use python3 environment, incompatible python2), 
so I wrote a simple example of fetching a page with a picture of grammar Python3, hoping to help to everyone, and I hope you criticism. 
Import the urllib.request
 Import Re
 Import OS
 Import the urllib
 # to obtain the webpage according to details given URL is obtained html page's source code   
DEF the getHtml (URL): 
    Page = the urllib.request.urlopen (URL) 
    html = Page. Read ()
     return html.decode ( ' UTF-. 8 ' ) 

DEF GETIMG (HTML): 
    REG = R & lt ' the src = "(.?. + \ JPG)" pic_ext ' 
    imgre =the re.compile (REG) 
    imglist = imgre.findall (HTML) # represents the filtered entire page address all the pictures placed in imglist 
    X = 0 
    path = ' D: \\ Test '   
   # save the picture to D: \\ test folder, if there is no test folder is created 
    IF  not os.path.isdir (path):   
        os.makdirs (path)   
    paths = path + ' \\ '       # saved in the test path   

    for imgUrl in imglist:   
        urllib .request.urlretrieve (imgUrl, ' {{0}}. 1 .jpg ' .format (Paths, X))   # open the saved imglist the image URL and download the images stored locally, format formatted string
        . 1 + X = X   return imglist 
html = the getHtml ( " http://tieba.baidu.com/p/2460150866 " ) # obtains the URL of the page for more information, is obtained html page source   Print (GETIMG (html)) # analyze and save images from a web page to download the source code
    

 

Guess you like

Origin www.cnblogs.com/roboot/p/11410323.html