python3爬取页面内容并筛选 - 代码天地

python3爬取页面内容并筛选

其他 2018-04-24 10:41:56 阅读次数: 2

from urllib import request
import re
def getResponse(url):
    url_request = request.Request(url)
    url_response = request.urlopen(url_request)
    return url_response
def getData(data):
    html = re.findall(r'alt="[\u4E00-\u9FA5\\s]+"',data)
    return html
aid = 1
for aid in range(1,123):
    html = "http://www.zhijiaow.com/ShopMallList_%s_0.html" %aid
    aid +=1    
    http_response = getResponse(html)
    data = http_response.read().decode('utf8')
    l = getData(data)
    global n
    n = 1
    for info in l:
        with open('c.txt','a') as f:
            f.write(info)
        n +=1
with open('c.txt','r') as f:
    lines = f.readlines()
with open('a.txt','a') as w:
    for l in lines:
        w.write(l.replace('"alt="','\n'))

猜你喜欢

转载自www.cnblogs.com/isule/p/8926754.html

python3爬取页面内容并筛选

[实战演练]python3使用requests模块爬取页面内容

python3 爬虫爬取blog内容

python3定向爬取网页内容

Python3 之爬取网站页面

Python3——爬取淘宝评论

python3爬取网页图片

python3爬取图片

python3爬取租房的信息

python3 爬取影像数据

使用Python3爬取美女

python3 爬取API数据

使用python3爬取小说

Python3爬取音乐

python3 爬取天气网页

[Python3爬虫]爬取新浪微博用户信息及微博内容

python3 使用BeautifulSoup爬取网页内容保存到csv

Python3网络爬虫：requests爬取动态网页内容

python3：爬取的内容包含中文，输出后乱码的问题

【爬虫】002 python3 +beautifulsoup4 +requests 爬取静态页面

python3爬取1000个百度百科页面（二）

python3爬取1000个百度百科页面（一）

python3编写网络爬虫14-动态渲染页面爬取

Python3爬取墨迹天气页面，并发送邮箱提醒

python中如何爬取动态页面内容

Python爬虫爬取搜狗搜索到的内容页面

【Python爬虫】之爬取页面内容、图片以及用selenium爬取

2018-9月 python3爬取百度百科100个页面

python3爬取qq音乐并下载 Python 爬取qqmusic音乐url并批量下载

python3爬取古诗词

今日推荐

《美国对全球网络空间安全与发展的威胁和破坏》报告发布

火速冲上 GitHub 热榜 —— 开源编程语言、框架哪有这么可爱？

北京人形机器人创新中心发布全球首个纯电驱拟人奔跑的全尺寸人形机器人“天工”

LFOSSA 源来如此公开课 | 掌握云原生未来：CNCF 认证全面攻略与备考秘籍

周排行

让自己的头脑极度开放

CentOS 6.5(x64) 和Redhat6.5操作系误删libc

高可用注册中心

【日记】12.28/【题解】AtCoder AGC041

XML（5）_XML 约束_DTD

Java集合Map（四）

树梅派安装桌面环境教程

pipenv 的使用和安装

小程序白屏问题和内存研究

C语言简单选择排序

每日归档

更多

2024-05-02(0)

2024-05-01(4)

2024-04-30(1)

2024-04-29(40)

2024-04-28(0)

2024-04-27(56)

2024-04-26(39)

2024-04-25(22)

2024-04-24(36)

2024-04-23(26)