进程池爬取汽车之家.py - 代码天地

进程池爬取汽车之家.py

其他 2020-01-19 21:17:32 阅读次数: 0

import time
import requests
#线程池、进程池
from concurrent.futures import ThreadPoolExecutor,ProcessPoolExecutor
#多线程：
from threading import Thread
#多进程：
from multiprocessing import Process
#进程池：
from multiprocessing import Pool
from bs4 import BeautifulSoup
#导入cpu_count查看CPU信息获取本机CPU核数：
from multiprocessing import cpu_count

def task(url):
    #format格式化页数：
    response = requests.get("https://www.autohome.com.cn/all/{}/#liststart".format(url))
    #获取编码：
    # print(response.encoding)
    #转码：
    response.encoding = "gbk"
    #获取文本：
    text = response.text
    #解析文本：
    soup = BeautifulSoup(text,"html.parser")
    #获取div：
    div = soup.find(name = "div",attrs={"id":"auto-channel-lazyload-article"})
    #获取img：
    img_list = div.find_all(name = "img")
    #获取第一个链接和长度：
    # print(img_list[0],len(img_list))
    print(response.url)
    for i in img_list:
        print("https:" + i.get("src"))
        break

if __name__ == '__main__':
    """进程池一般开CPU核数、线程池开CPU核数的2-5倍、"""
    # print(cpu_count())
    stat = time.time()
    #开启进程池、4核是4进程乘以2总共是8个进程：
    p = ProcessPoolExecutor(max_workers=cpu_count())
    for i in range(1,110):
        p.submit(task,i)
    p.shutdown()
    print("耗时：%s" %(time.time() - stat))

猜你喜欢

转载自www.cnblogs.com/zhang-da/p/12215525.html

进程池爬取汽车之家.py

线程池爬取汽车之家.py

爬取汽车之家

爬取汽车之家新闻

Python 定向爬虫爬取汽车之家

python爬虫——爬取汽车之家新闻

scrapy汽车之家车型的简单爬取

爬取汽车之家汽车品牌型号系列数据

Python爬取最新反爬虫汽车之家口碑

python爬虫实战爬取汽车之家上车型价格

汽车之家网站为例-爬虫的编写，爬取图片

汽车之家数据爬取:文章链接//图片//标题

记爬取汽车之家车型配置页面的经历

Python项目实战:爬取汽车之家新闻信息

小例题-爬取汽车之家资讯

爬取汽车之家新闻图片的python爬虫代码

汽车之家反爬

爬汽车之家

爬取IT之家新闻

爬取站长之家

进程池爬取并存入mongodb

python入门-----爬取汽车之家新闻,---自动登录抽屉并点赞,

Webmagic学习（爬取马蜂窝、汽车之家、携程旅游游记数据）

Webmagic 爬虫框架爬取马蜂窝、携程旅游、汽车之家游记信息

python网络爬虫爬取汽车之家的最新资讯和照片

python3 爬取汽车之家所有车型操作步骤

爬取汽车之家北京二手车信息

WebMagic爬虫入门教程（三）爬取汽车之家的实例-品牌车系车型结构等

爬虫入门五：练习爬取汽车之家新闻阅读量信息

python3爬虫系列16之多线程爬取汽车之家批量下载图片

今日推荐

《美国对全球网络空间安全与发展的威胁和破坏》报告发布

火速冲上 GitHub 热榜 —— 开源编程语言、框架哪有这么可爱？

北京人形机器人创新中心发布全球首个纯电驱拟人奔跑的全尺寸人形机器人“天工”

LFOSSA 源来如此公开课 | 掌握云原生未来：CNCF 认证全面攻略与备考秘籍

国产云输入法——仅华为无云端数据上传安全问题

周排行

Python环境安装与基础语法（1）——计算机基础知识

IMU预积分

ADAS中的LDW、FCW、BSD、LCA、ACC、AEB、APA、DMS代表的含义

B站笔试两道题

skyeye arm 硬件虚拟机环境的搭建

Web前端静态页面示例

数组-合并排序数组 II-简单

springcloud之版本问题启动报错

面向对象-------------匿名对象(六)

输入URL到页面呈现中间发生了什么？

每日归档

更多

2024-04-30(1)

2024-04-29(40)

2024-04-28(0)

2024-04-27(56)

2024-04-26(39)

2024-04-25(22)

2024-04-24(36)

2024-04-23(26)

2024-04-22(39)

2024-04-21(0)