After crawling on a watercress critics success, reptiles feel pretty good, so want to climb the point images to play ...
Sogou Picture Address: https://pic.sogou.com/?from=category
First on the ultimate success of the source code:
import requests import urllib import json from fake_useragent import UserAgent def getSougouImag(category,length,path): n = length cate = category imgs_url = [] # define empty list, save the picture URL m = 0 # for displaying the number of image URL = ' https://pic.sogou.com/pics/channel/getAllRecomPicByTag.jsp?category= ' + + Cate ' Tag = E5%% & 85% 83%% the A8% E9 & Start the A8 = & len = 0 ' + STR (n-) headers = { ' User-Agent ' :. UserAgent () Random} # Set the UA F = requests.get (URL, headers = headers) # send a Get request Print (f.status_code) js = json.loads(f.text) js = js['all_items'] for j in js: imgs_url.append(j['thumbUrl']) for img_url in imgs_url: print('***** '+str(m)+'.jpg *****'+' Downloading...') urllib.request.urlretrieve (img_url, path + STR (m) + ' .jpg ' ) # download locally the url m + =. 1 Print ( ' the Download Complete! ' ) getSougouImag ( ' wallpaper ' , 500, R & lt ' D: \ souGouImg / ' )
Renderings:
The following describes the start as a novice reptile steps ...
1, first open the page to view HTML source code
Press F12 to open the debug interface -> right-click the image -> click on the check
Information as shown in the red box will appear, not difficult to see this image url is the value of the src attribute of the img tag.
So Easy? That direct access to the value of the src attribute, then the download does not completely ok?
Man of few words said, open dry.
from BS4 Import the BeautifulSoup Import Requests from fake_useragent Import UserAgent # UA repository URL = ' https://pic.sogou.com/pics/recommend?category=%B1%DA%D6%BD&from=home#%E5%85%A8% the A8%%% 83 E9 269 ' headers = { ' User-Agent ' :. UserAgent () Random} # set the UA F = requests.get (URL, headers = headers) # send a Get request Print (f.status_code) # Print status code Soup = the BeautifulSoup (f.text, ' lxml ' ) # Parse the contents of the page with lxml parser Print (soup.select ( ' img ' )) # filter out all img tag, and print properties and its contents
Code execution results are as follows:
Found printed html web page is not the same, all considered, this is not the picture of the source url, and then guess the picture is dynamic, Baidu also continue to find ... a big brother to the article, only to find out the following Search method.
2. Click NetWork-> Click XHR-> then scroll wheel down, it loads a new image -> click on the newly loaded out of the picture -> click Preview on the right side
Find content under Preview for the json format
Found all_items, 0 ..... click on it found numerous figures, and then point to the development of many of the existing url, paste it into your browser to view and found that these are pictures url (rejoicing)
Find pictures of the real URL, the problem will become easier. For more details, please see the code or comment it ~