简单爬虫操作：1.简单爬取网页数据并输出 2.爬取数据打印到xls表格中 - 代码天地

简单爬虫操作：1.简单爬取网页数据并输出 2.爬取数据打印到xls表格中

其他 2020-03-27 21:50:52 阅读次数: 0

安装python环境参考菜鸟教程：

传送门：https://www.runoob.com/w3cnote/python-pip-install-usage.html

1..简单爬取网页数据并输出

import requests
from lxml import etree
import xlwt # 获取源码 html = requests.get("https://www.ghpym.com/category/videos") # 打印源码 #print (html.text)  etree_html = etree.HTML(html.text) #将源码转化为能被 XPath 匹配的格式 # #//*[@id="wrap"]/div/div/div/ul/li[1]/div[2]/h2/a/text() content = etree_html.xpath('//*[@id="wrap"]/div/div/div/ul/li/div[2]/h2/a/@href') for each in content: replace = each.replace('\n','').replace(' ','') #去掉换行符和空格 if replace =='\n' or replace == "": continue else: print (replace) content = etree_html.xpath('//*[@id="wrap"]/div/div/div/ul/li/div[2]/h2/a/text()') for each in content: replace = each.replace('\n','').replace(' ','') if replace =='\n' or replace == "": continue else: print (replace) print("完成")

2.爬取数据打印到xls表格中

# coding:utf-8
from lxml import etree
import requests import xlwt title=[] def get_film_name(url): html = requests.get(url).text #这里一般先打印一下 html 内容，看看是否有内容再继续。 #print(html) s=etree.HTML(html) #将源码转化为能被 XPath 匹配的格式 filename =s.xpath('//*[@id="wrap"]/div/div/div/ul/li/div[2]/h2/a/@href') #返回为一列表 print (filename) title.extend(filename) def get_all_film_name(): for i in range(0, 250, 25): url = 'https://www.ghpym.com/category/videos'.format(i) get_film_name(url) if '_main_': myxls=xlwt.Workbook() sheet1=myxls.add_sheet(u'top250',cell_overwrite_ok=True) get_all_film_name() for i in range(0,len(title)): sheet1.write(i,0,i+1) sheet1.write(i,1,title[i]) myxls.save('top250.xls') print("完成")

猜你喜欢

转载自www.cnblogs.com/jessezs/p/12584505.html

简单爬虫操作：1.简单爬取网页数据并输出 2.爬取数据打印到xls表格中

利用爬虫爬取简单页码类网页数据

Python 简单爬取网页数据

WebMagic简单爬取Bilibili网页数据

爬虫——爬取网页数据存入表格

1.简单爬虫————爬取古诗网

照葫芦画瓢之python爬虫系列----（2）初次爬取简单的动态网页数据（网易、QQ音乐排行榜）

python爬虫入门（1）简单爬取网页源码

正则爬取网页数据(三)

正则爬取网页数据(二)

Python爬取网页数据

爬取网页数据python

java网页数据爬取

python初学-爬取网页数据

如何快速爬取网页数据

jsoup爬取网页数据

Scrapy爬取网页数据

使用 Python 爬取网页数据

Java爬取网页数据

爬取网页数据基础

python爬取网页数据方法

使用XPath爬取网页数据

JAVA爬虫爬取网页数据数据库中,并且去除重复数据

爬虫基本流程及简单爬取网页

Python 爬虫爬取多页数据

爬虫（5）爬取多页数据

爬虫快速入门——简单爬取数据

简单爬取html页面的表格中的数据

python爬取网页的方法总结,python爬虫获取网页数据

Python网络爬虫实现HTTP请求、解析网页和数据存储（简单静态网页爬取）

今日推荐

Linus “吃狗粮”最积极！

开源日报 | Winamp播放器即将开源；生成式AI之战升级第二轮；Linus“吃狗粮”最积极；AI进入泡沫前期；吴泳铭为阿里云带来了什么？

NetBSD 禁止提交由 AI 生成的代码

Apache Doris 2.0.10 版本正式发布！

开源日报 | 大模型开战；大模型独角兽被曝卖身；周鸿祎建议谷歌开源所有产品；最大开源AI社区提供1000万美元共享GPU

开源日报 | Chrome内置Gemini的意义不在于Gemini；中国AI追随之路的五大误区；ECharts创始人“下海”养鱼；谷歌I/O开发者大会什么都有，只是没有惊喜

微软回应中国区AI团队“打包赴美”传闻

周排行

SVN服务端安装在阿里云

实战 | 相机标定

webpack核心概念

note20——》只要肯低头吃苦，人生就会有救

PAT甲级 1062 Talent and Virtue （25 分）排序

NG Toolset开发笔记--5GNR Resource Grid（26）

如何对待上司

oracle命令

第9章 STL迭代器

logstash使用es映射模板

每日归档

更多

2024-05-20(36)

2024-05-19(0)

2024-05-18(4)

2024-05-17(34)

2024-05-16(6)

2024-05-15(24)

2024-05-14(0)

2024-05-13(18)

2024-05-12(0)

2024-05-11(38)