python requests 抓取one 首页推送文字和图片 - 代码天地

python requests 抓取one 首页推送文字和图片

其他 2019-07-01 13:54:50 阅读次数: 0

from bs4 import BeautifulSoup
from lxml import html
import xml
import requests

#下载图片函数
def download_img(url,name):
    """"
    下载指定url的图片
    url：图片的url；
    name:保存图片的名字
    """
    try:
        respone = requests.get(url)
        f_img = respone.content
        path = r'C:\Users\86131\Desktop\itchat\send_file\images\\%s.jpg'%(name)
        with open(path, "wb")as f:
                f.write(f_img)
    except Exception as e:
        print("---------地址出错------------")

url_list = []

f = requests.get("http://wufazhuce.com/")

# #打印网页内容
# print(f.content.decode())

soup = BeautifulSoup(f.content,"lxml")

try:
    first_div = soup.find("div",attrs={'id':'main-container'}).find('div',attrs={'class':'carousel-inner'})
    a_all = first_div.find_all('a')

    for i  in a_all:
        url_list.append(i.attrs['href'])

except Exception as e:
        print("---------出错------------")

#得到one的首页推荐页面
f_1 = requests.get(url_list[0])

#打印网页内容
# print(f_1.content.decode())

soup_1 = BeautifulSoup(f_1.content,"lxml")

try:
    second_div = soup_1.find("div",attrs={'id':'main-container'}).find('div',attrs={'class':'one-cita-wrapper'})
    third_div = soup_1.find("div",attrs={'id':'main-container'}).find('div',attrs={'class':'one-imagen'})

    #获得时期值
    now_month = second_div.find('p',attrs={'class':'may'}).text
    now_one_day = second_div.find('p',attrs={'class':'dom'}).text

    #获得图片的url
    img_url = third_div.find('img').attrs['src']

    #获得一段话并去除开头的空格
    one_text = second_div.find("div",attrs={'class':'one-cita'}).text.strip()

    #将获得日期拼接
    now_day = now_one_day +' '+ now_month

    #调用函数下载图片

    download_img(img_url, now_day)

except Exception as e:
        print("---------出错------------")

猜你喜欢

转载自www.cnblogs.com/changfan/p/11113471.html

python requests 抓取one 首页推送文字和图片

Python的Requests的图片抓取和代理使用！

[Python][爬虫03]requests+BeautifulSoup实例:抓取图片并保存

Python 进行抓取图片的时候 import requests时报错

python使用selenium和requests.session登录抓取

python requests下载图片

python requests 保存图片

python使用requests爬虫抓取美女图片网站图片

实战：如何通过python requests库写一个抓取小网站图片的小爬虫

用PYTHON的requests库和re库抓取博主粉丝ID号

Python使用lxml模块和Requests模块抓取HTML页面的教程

Python网络编程（五）-利用requests和BeautifulSoup进行网络数据抓取和解析

Python爬虫requests 下载图片

Python requests从网络下载图片

Python 安装requests和MySQLdb

python安装requests和BeautifulSoup

python requests post和get

python淘宝爬虫基于requests抓取淘宝商品数据

Python3使用Requests抓取网页乱码问题

python+requests+re匹配抓取猫眼上映电影信息

抓取王者荣耀英雄列表的爬虫笔记(python+requests)

[Python爬虫] 三、数据抓取之Requests HTTP 库

【python爬虫系列】4.Requests数据抓取

Python爬虫技术系列-03/4flask结合requests测试静态页面和动态页面抓取

python requests

【Python requests】

【python】requests

python——requests

python用requests模块下载图片

Requests库抓取数据

今日推荐

TIOBE 5 月榜单：Fortran “复活”进入 Top 10

GCC 14.1 发布

面壁智能发布 Eurux-8x22B 开源大模型 —— 堪称「理科状元」

开源日报 | 谷歌扶持鸿蒙上位；开源Rabbit R1；Docker加持的安卓手机；微软的焦虑和野心；海尔电器把开放平台关了

中国码农的“35岁魔咒”

蘭雅 CorelDRAW 插件 2024.5.1 国际劳动节版，免费下载

Arc Browser for Windows 1.0 正式 GA

90后程序员开发视频搬运软件、不到一年获利超 700 万，结局很刑！

周排行

Java自定义时间格式

同步整形电路

在开发中最最最常用的字符串的属性大集合

Linux 查看端口占用并杀掉

Java基础四：ArrayList

多线程之死锁就是这么简单

mysql 基础命令集

awk 命令详解

Centos6.3编译安装nginx+php步骤

OCR （Optical Character Recognition，光学字符识别）

每日归档

更多

2024-05-08(42)

2024-05-07(14)

2024-05-06(40)

2024-05-05(0)

2024-05-04(7)

2024-05-03(19)

2024-05-02(0)

2024-05-01(4)

2024-04-30(1)

2024-04-29(40)