慕课中爬取淘宝商品信息 - 代码天地

慕课中爬取淘宝商品信息

其他 2018-08-09 04:50:13 阅读次数: 0

 1 import requests
 2 import re
 3 
 4 def getTHMLText(url):
 5     try:
 6         r = requests.get(url, timeout=30)
 7         r.raise_for_status()
 8         r.encoding = r.apparent_encoding
 9         return r.text
10     except:
11         return " "
12 
13 def parsePage(ilt, html):
14     try:
15         plt = re.findall(r'\"view_price\"\:\"[\d\.]*\"',html)
16         tlt = re.findall(r'\"raw_title\"\:\".*?\"', html)
17         for i in range(len(plt)):
18             price = eval(plt[i].split(":")[1])
19             title = eval(tlt[i].split(":")[1])
20             ilt.append([price, title])
21     except:
22         print(" ")
23 
24 def printGodeList(ilt):
25     tplt = "{:4}\t{:8}\t{:16}"
26     print(tplt.format("序号", "价格", "商品名称"))
27     count = 0
28     for g in ilt:
29         count = count + 1
30         print(tplt.format(count, g[0], g[1]))
31 
32 def main():
33     goods = "书包"
34     depth = 3
35     start_url = "https://s.taobao.com/search?q=" + goods
36     infoList = []
37     for i in range(depth):
38         try:
39             url = start_url + "&s==" + str(44*i)
40             html = getTHMLText(url)
41             parsePage(infoList, html)
42         except:
43             continue
44     printGodeList(infoList)
45 
46 main()

这个爬取采用了，requests-re路线实现了淘宝商品的比价定向爬取，并没有采用requests-BeautifulSoup的方法来实现。用正则表达的方式来提取信息，比用bs4库的方法更加简单。重难点也是正则表达式的应用。

我们分析价格的使用键值对表示的，所以我们应该找“view_price",来寻找价格。

分析商品的名称，是用键值对表示的。所以应该用"raw_title"来寻找商品的名称。

猜你喜欢

转载自www.cnblogs.com/tianqianlan/p/9446578.html

慕课中爬取淘宝商品信息

爬虫爬取淘宝商品信息

Python爬取淘宝商品信息

爬取淘宝商品信息

Python爬取淘宝商品信息入库

python学习之爬取淘宝商品信息

利用Selenium爬取淘宝商品信息

python爬虫 — 爬取淘宝商品信息

爬取淘宝商品信息selenium+pyquery+mongodb

selenium＋pyquery爬取淘宝商品信息

比价网站的基础-爬取淘宝的商品信息

python：淘宝商品信息定向爬取

python爬虫爬取淘宝网商品信息

多进程爬取淘宝商品信息

selenium和pyquery爬取淘宝美食商品信息

requests和re库爬取淘宝商品信息

<day003>登录+爬取淘宝商品信息

使用正则库爬取淘宝商品信息

selenium登录爬取淘宝商品信息

Python爬虫爬取淘宝，京东商品信息

淘宝商品信息爬取（已登录）

python爬取并分析淘宝商品信息

Python爬取淘宝商品信息并生成Excel

我要爬爬虫(11)-用selenium爬取淘宝商品信息

Python网络爬虫与信息提取（7）—— 用re库爬取淘宝商品信息

java京东商品信息爬取

爬取京东商品信息

爬取京东商城商品信息

爬取某东商品信息

N0.8——应用pyquery和Selenium爬取淘宝商品信息

今日推荐

Arc Browser for Windows 1.0 正式 GA

90后程序员开发视频搬运软件、不到一年获利超 700 万，结局很刑！

《美国对全球网络空间安全与发展的威胁和破坏》报告发布

火速冲上 GitHub 热榜 —— 开源编程语言、框架哪有这么可爱？

北京人形机器人创新中心发布全球首个纯电驱拟人奔跑的全尺寸人形机器人“天工”

周排行

rbac——界面、权限

Apache CXF + SpringMVC 整合发布WebService

so插件化

Vue.js实战系列---图标字体制作（svg格式）

PAT乙级 1007 素数对猜想(孪生素数对) (20分) ---（C语言 + 详细注释）

被IRM保护的文档，打开失败

Calendar和Date计算日期差的小问题

win10子系统ubuntu18.4安装docker

利用Wrap Shell Script定位Android Native内存泄漏

MySQL: Transaction (Part I - Basic Concept)

每日归档

更多

2024-05-03(19)

2024-05-02(0)

2024-05-01(4)

2024-04-30(1)

2024-04-29(40)

2024-04-28(0)

2024-04-27(56)

2024-04-26(39)

2024-04-25(22)

2024-04-24(36)