Crawl landscape pictures

On a whim, let you climb wallpaper pictures for you, don’t say much, start acting. If the article is helpful to you, like it and bookmark it.

One, know to crawl the URL of the wallpaper image you want and
write out the model

'''
爬取网络图片
1,要到主页面的源码,从主页面拿到子页面连接
2,通过子页面内容,找到子页面下载路径
3,下载图片

'''

def picture():
    host_page(url)
    son_page()
    download()
def host_page():
    #获取主页面
    pass
def son_page():
    #获取子页面
    pass
def download():
    #下载图片
    pass
if __name__ == '__main__':
    picture()

Goal
Insert picture description here
Click f12 to enter the developer mode. Find the picture you need (or click the right mouse button and click to check)
Insert picture description here

'''
爬取网络图片
1,要到主页面的源码,从主页面拿到子页面连接
2,通过子页面内容,找到子页面下载路径
3,下载图片
'''
import requests
from bs4 import BeautifulSoup
import time
def picture():
    host_page()
    download()
def host_page():
    #获取主页面
    url='https://pic.netbian.com/4kfengjing/'
    headers = {
    
    
        'User-Agent': "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/87.0.4280.67 Safari/537.36 Edg/87.0.664.47"
    }  # 模拟的服务器头
    resp=requests.get(url,headers=headers)#换头
    newurl=BeautifulSoup(resp.text,'html.parser')#主页面内容就出来了


    print(newurl)


def download():
    #下载图片
    pass
if __name__ == '__main__':
    host_page()

The garbled code is displayed, so change the encoding. Insert picture description here
Refer to the operation address: https://editor.csdn.net/md/?articleId=112390388 (the processing method is written to make a separate blog, and everyone is welcome to watch it) After
Insert picture description here
Insert picture description here
processing the garbled code, filter and select I want content.

'''
爬取网络图片
1,要到主页面的源码,从主页面拿到子页面连接
2,通过子页面内容,找到子页面下载路径
3,下载图片
'''
import requests
from bs4 import BeautifulSoup
import time
def picture():
    host_page()
    download()
def host_page():
    #获取主页面
    url='https://pic.netbian.com/4kfengjing/'
    headers = {
    
    
        'User-Agent': "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/87.0.4280.67 Safari/537.36 Edg/87.0.664.47"
    }  # 模拟的服务器头
    resp=requests.get(url,headers=headers)#换头
    resp.encoding='gbk'#处理乱码
    newurl=BeautifulSoup(resp.text,'html.parser')#主页面内容就出来了
    alist=newurl.find('div',class_='slist').find_all("a")#查找a标签
    for i in alist :
        print(i.get('href'))#获取子页面



def download():
    #下载图片
    pass
if __name__ == '__main__':
    host_page()

Phenomenon: Address Insert picture description here
Create a folder below to store photos.
Insert picture description here

'''
爬取网络图片
1,要到主页面的源码,从主页面拿到子页面连接
2,通过子页面内容,找到子页面下载路径
3,下载图片
'''
import requests
from bs4 import BeautifulSoup
import time
def picture():

    #获取主页面
    url='https://pic.netbian.com/4kfengjing/'
    headers = {
    
    
        'User-Agent': "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/87.0.4280.67 Safari/537.36 Edg/87.0.664.47"
    }  # 模拟的服务器头
    resp=requests.get(url,headers=headers)#换头
    resp.encoding='gbk'#处理乱码
    newurl=BeautifulSoup(resp.text,'html.parser')#主页面内容就出来了
    alist=newurl.find('div',class_='slist').find_all("a")#查找a标签
    for i in alist :
        href=i.get('href')#获取子页面
        child_resp=requests.get('https://pic.netbian.com/'+href)
        child_resp.encoding='gbk'
        text=child_resp.text
        child_page = BeautifulSoup(text, 'html.parser')
        a=child_page.find('a',id='img')
        img=a.find('img')
        src=img.get('src')
        #下载图片
        print(src)
        img_resp=requests.get('https://pic.netbian.com/tupian/21953.html'+src)

        img_name=src.split("/")[-1]#取url中最后一个/以后内容为名字

        with open('picture/'+img_name,mode='wb') as f :
            f.write(img_resp.content) #图片内容获取
        print('下载完成')
        time.sleep(1)#防止ip地址被封,休息1秒后继续

if __name__ == '__main__':
   picture()

Insert picture description here

Insert picture description here
Folder de-index (convenient)
Insert picture description here

Okay, that's it, thank you for watching.

Guess you like

Origin blog.csdn.net/weixin_47514459/article/details/114408202
Recommended