Anti-theft chain principle
http standard protocols have a special field record referer
One to be traced on an inbound address what is
And secondly for the resource file that can contain trace to show what his website address is
So all anti-hotlinking methods are based on the Referer field
so: Many sites use security chain to set anti-reptile mechanism, set up after such a mechanism to directly access route through the picture will return a 403 error,
In fact, the solution is simple, add header, and then Referer can write!
headers = { 'User-Agent': 'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/65.0.3325.181 Safari/537.36', 'Referer': url }
This article crawling https://www.mn52.com/ above website pictures pure beauty, as follows;
# Required libraries Import Requests Import Re Import OS from multiprocessing Import Pool # main function DEF get_img (URL): # Picture storage path path = ' ./mn52/ ' IF Not os.path.exists (path): os.mkdir (path) # request header, because the picture route has the anti-theft chain disposed so add 'Referer' in headers are: URL headers = { ' the User-- Agent ' : ' the Mozilla / 5.0 (the Windows NT 6.1; the WOW64) AppleWebKit / 537.36 (KHTML, the Gecko like) the Chrome / 65.0.3325.181 Safari / 537.36 ', ' The Referer ' : URL } the try : # Request Routing Home Page Response = requests.get (URL = URL, headers = headers) # Print (response.text) # regular sub-page acquired and traverse extracted res_paging the re.findall = ( ' .? <div class = "picbox"> * <A href = "(.? *)" ' , response.text, re.S) for i in res_paging: # stitching sub-page routing url_infos = ' HTTPS: // the WWW .mn52.com ' + I # request routing sub-page = requests.get res_details (url = url_infos, headers = headers) # traversing Get Picture routing res_detail = re.findall ( ' <div class = "img-wrap">. *? <img. *? rel = "(. * ?) "/> ' , res_details.text, re.S) for i in res_detail: # entire image routing img_urls = ' HTTPS: ' + i # naming your image filename = i.split ( ' / ' ) [- 1 ] # judge whether the image has been downloaded iF os.path.exists (path +STR (filename)): Print ( ' image already exists ' ) the else : # request Image connection RES = requests.get (URL = img_urls, headers = headers) # save the image with Open (path + STR (filename), ' WB ' ) AS F: f.write (res.content) # print download information Print ( ' downloading: ' + img_urls) the except Exception AS E: Print (E) # program entry IF the __name__ == ' __main__ ' : # construct a complete routing URLs = [ ' https://www.mn52.com/meihuoxiezhen/list_2_{}.html ' .format (I) for I in Range (1,94 )] # multi-process open the pool = Pool () # start the program pool.map (get_img, urls) Print ( ' grab is complete ' )
More pictures, take some time to download, the console displays the download process
Open the file to view the pictures whether the download was successful
done