environment use
- Python 3.8
- Pycharm
module use
- requests
- jieba participle
- wordcloud word cloud
Data source analysis
Clarify requirements <data source analysis>
-
What is the collected data? Get the content of the desired data through that url address
-
Packet capture analysis: browser built-in tools --> developer tools
I. F12 or right-click to check and select network Click on the second page
II. Copy the comment content, search in the developer tools, you can directly find the corresponding comment data package
https://club.jd.com/comment/productPageComments.action?callback=fetchJSON_comment98&productId=100029079354&score=0&sortType=5&page=1&pageSize=10&isShadowSku=0&rid=0&fold=1
Data acquisition code implementation
send request
url = 'https://***屏蔽一下不然不给过.com'
# 请求参数 --> 字典数据类型 构建完整键值对
data = {
# 'callback': 'fetchJSON_comment98',
'productId': '100029079354',
'score': '0',
'sortType': '5',
'page': page,
'pageSize': '10',
'isShadowSku': '0',
'rid': '0',
'fold': '1',
}
# 模拟浏览器 --> headers 请求头
headers = {
'user-agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/101.0.0.0 Safari/537.36'
}
# 发送请求 requests 模块 get 方法<请求方式>
# 等号左边: url/params/headers 属于get函数里面形式参数 等号右边 url/data/headers 传入进去参数/变量
response = requests.get(url=url, params=data, headers=headers)
retrieve data
The server returns response data
- response response object
- response.text Get the response text data
- response.json() Get response json dictionary data
Analytical data
Dictionary data type: Extract data content through key-value pairs <dictionary value>
According to the content on the left of the colon [key], extract the content on the right of the colon [value]
# for循环遍历 把列表里面元素一个一个提取出来
for i in response.json()['comments']:
content = i['content']
print(content)
save data
python学习交流Q群:770699889 ### 源码领取
with open('口红评论.txt', mode='a', encoding='utf-8') as f:
# 写入数据内容
f.write(content)
f.write('\n')
word cloud code
Well, today’s sharing is over here, if you have any questions about the article, you can leave a message or private message