多线程下载cosplay图片

版权声明:本文为博主原创文章,遵循 CC 4.0 BY-SA 版权协议,转载请附上原文出处链接和本声明。
本文链接: https://blog.csdn.net/qq_45026221/article/details/102634550

上次写的爬取美女图片的代码只用了单线程,下载速度很慢,又因为上次网站的限制导致不能多线程下载,这个我又换了一个cosplay的网站,用多线程快速下载

需要安装的第三方库有Beautifulsoup,requests,threading

代码如下:

from bs4 import BeautifulSoup
import requests
import os
import threading
import sys


def get_urls():   # 获取图集的urls
    urls = []
    for i in range(1, 6):
        try:
            res = requests.get('http://www.win4000.com/meinvtag26_' + str(i) + '.html')
            if res.status_code == 200:
                print('连接成功')
                soup = BeautifulSoup(res.text, 'lxml')
                list = soup.find(class_='Left_bar').find('ul', class_='clearfix').find_all('li')
                for item in list:
                    urls.append(item.find('a').get('href'))
        except requests.RequestException:
            print('连接失败')
            sys.exit()
    return urls


def grab_download(url):
    index = 1
    res2 = requests.get(url)
    soup2 = BeautifulSoup(res2.text, 'lxml')
    title = soup2.find(class_='ptitle').find('h1').string
    page = soup2.find(class_='ptitle').find('em').string
    folder = 'pics/' + title + '/'
    if os.path.exists(folder) is False:
        os.makedirs(folder)
    print('正在下载图集' + title)
    for i in range(1, int(page)+1):
        res3 = requests.get(url[:-5] + '_' + str(i) + '.html')
        soup3 = BeautifulSoup(res3.text, 'lxml')
        pic_url = soup3.find(class_='pic-meinv').find('a').find('img').get('data-original')
        with open(folder + str(index) + '.jpg', 'wb') as f:
            img = requests.get(pic_url).content
            f.write(img)
        index += 1
    print(title + '图集下载完成')


if __name__ == '__main__':
    urls = get_urls()
    threads = []
    for url in urls:
        threading.Thread(target=grab_download, args=(url,)).start()

下载位置在代码的同级目录pics中,实测不到80s下载完

写代码不易,大家帮忙点个赞吧qwq

猜你喜欢

转载自blog.csdn.net/qq_45026221/article/details/102634550