Here give a simple Liezi, to capture the title home picture!
The first step is the need to do first reptile camouflage UA, UA initiates a request to the site by masquerading disguised as a browser, there is a parameter headers when the request send request, we can put this parameter into User-Agent headers this parameter
headers = { 'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.149 Safari/537.36' }
You can find this parameter in the packet capture tool browser
Well, then we can send a request for a page to get the data pages!
Import Requests from lxml Import etree headers = { ' the User-- Agent ' : ' the Mozilla / 5.0 (the Windows NT 10.0; Win64; x64-) AppleWebKit / 537.36 (KHTML, like the Gecko) the Chrome / 80.0.3987.149 Safari / 537.36 ' } URL = ' http://699pic.com/photo/ ' Response = requests.get (URL = URL, headers = headers) .text # page data acquired at this time the
Next we need to get to the target page spread etree generated
tree = etree.HTML(response)
From this chart we can see that this is a picture of each div and are in the same div, the title is in the inside of each div p tags, div then can we put these in one place, circulation were to get it?
The result is obvious, of course you can
div_list = tree.xpath('//div[@class="img-show"]/div/div/div') print(div_list) for div in div_list: name = div.xpath('./a[2]/p/text()')[0] print(name)
The first of these is a div_list div picture collection, stored in the print looked at a list of
This list is then recycled, can be taken out of the corresponding elements in the list p value.
final effect:
All code is shown below:
import requests from lxml import etree headers = { 'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.149 Safari/537.36' } url = 'http://699pic.com/photo/' response = requests.get(url=url, headers=headers).text tree = etree.HTML(response) div_list = tree.xpath('//div[@class="img-show"]/div/div/div') print(div_list) f = open('name.txt', 'w', encoding='utf-8') for div in div_list: name = div.xpath('./a[2]/p/text()')[0] f.write(name + '\n')