In a blink of an eye, we will come to the sixth lesson of our crawler basic course. Today we will get Weibo information for reading and learning!
PS The content of the previous few lessons is here in the column, welcome everyone to archeology: click me
First of all, the first step is to log in to Weibo: click me
Click the search box in the upper left corner to find the user you want to acquire:
You can see that there are two search methods here:
1. Search by keyword
2. Search by time
Today we will talk about the code! !
First of all, we will search according to [time], select the time, press [f12] or right-click to check, and then click search
At this time we found that this is a [get request] parameter will also be displayed in the url, let's take a look at the parameters
[uid] This is the user id
'starttime': '1690214400', timestamp
'endtime': '1690473600', timestamp
Code 1 - get json (attach the full version code at the end)
Note, please fill in your own cookie
import json
import time
import requests
cookie = {
'cookie': '请填写自己的cookie'}
headers = {
'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/103.0.0.0 Safari/537.36'}
Done:
Code 2 [Expand content:] If you don’t click [Expand], you will get part of the content, not complete
In the same way, click to expand, get the id of the current Weibo dynamic, and then request again to get the full version of the content! !
Code 2 data cleaning
date = con_json[‘data’][‘list’][i][‘created_at’] # 日期
con = con_json[‘data’][‘list’][i][‘text_raw’] # 内容
reposts_count = con_json['data']['list'][i]['reposts_count'] # reposts
comments_count = con_json[‘data’][‘list’][i][‘comments_count’] # 评论
attitudes_count = con_json[‘data’][‘list’][i][‘attitudes_count’] # 点赞
mblogid = con_json['data']['list'][i]['mblogid'] # ID
Here I don’t know how many have been posted during this period of time, so I wrote 999 pages
import json
import time
import requests
cookie = {
'cookie': '请填写自己的cookie'}
headers = {
'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/103.0.0.0 Safari/537.36'}
for i1 in range(1, 999):
params2 = {
'uid': '2656274875',
'page': f'{
i1}',
'feature': '0',
'starttime': '1690214400',
'endtime': '1690473600',
'hasori' :1 ,
'hasret' :1 ,
'hastext' :1 ,
'haspic' :1 ,
<