Python crawler combat (advanced) - 6 Obtaining Weibo information (with complete code)

In a blink of an eye, we will come to the sixth lesson of our crawler basic course. Today we will get Weibo information for reading and learning!

PS The content of the previous few lessons is here in the column, welcome everyone to archeology: click me

First of all, the first step is to log in to Weibo: click me

Click the search box in the upper left corner to find the user you want to acquire:

insert image description here

You can see that there are two search methods here:

insert image description here

1. Search by keyword

2. Search by time

Today we will talk about the code! !

First of all, we will search according to [time], select the time, press [f12] or right-click to check, and then click search

insert image description here

At this time we found that this is a [get request] parameter will also be displayed in the url, let's take a look at the parameters

insert image description here

[uid] This is the user id

'starttime': '1690214400', timestamp

'endtime': '1690473600', timestamp

insert image description here

insert image description here

Code 1 - get json (attach the full version code at the end)

Note, please fill in your own cookie

import json
import time
import requests
cookie = {
   
    
    
'cookie': '请填写自己的cookie'}
headers = {
   
    
    'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/103.0.0.0 Safari/537.36'}

Done:

insert image description here

Code 2 [Expand content:] If you don’t click [Expand], you will get part of the content, not complete

insert image description here

In the same way, click to expand, get the id of the current Weibo dynamic, and then request again to get the full version of the content! !

insert image description here

Code 2 data cleaning

date = con_json[‘data’][‘list’][i][‘created_at’] # 日期

con = con_json[‘data’][‘list’][i][‘text_raw’] # 内容

reposts_count = con_json['data']['list'][i]['reposts_count'] # reposts

comments_count = con_json[‘data’][‘list’][i][‘comments_count’] # 评论

attitudes_count = con_json[‘data’][‘list’][i][‘attitudes_count’] # 点赞

mblogid = con_json['data']['list'][i]['mblogid'] # ID

Here I don’t know how many have been posted during this period of time, so I wrote 999 pages

import json
import time
import requests
cookie = {
   
    
    
'cookie': '请填写自己的cookie'}
headers = {
   
    
    'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/103.0.0.0 Safari/537.36'}
for i1 in range(1, 999):
    params2 = {
   
    
    
        'uid': '2656274875',
        'page': f'{
     
      
      i1}',
        'feature': '0',
        'starttime': '1690214400',
        'endtime': '1690473600',
        'hasori' :1 ,
        'hasret' :1 ,
        'hastext' :1 ,
        'haspic' :1 ,
        <

Guess you like

Origin blog.csdn.net/weixin_42636075/article/details/131969482