Crawler crawls Sina Weibo data

Enterprise 2023-07-18 19:55:04 views: null

Tool: cloud mining crawler

Goal: Grab all Weibo of a blogger

Analyze web page structure:

The idea of our crawling is to simulate the browser's automatic access to page crawling.

Let's take a look at the page structure. First, each microblog list must be loaded three or four times. If there is a page turning button at the bottom, it is judged that the page is loaded.

login problem

Crawling requires a login account, how to log in?

The login does not require a verification code. If you make a mistake, you will be asked to enter the verification code, so there is no technical difficulty in logging in.

We can create a [login module], first log in with a browser, and then all pages will be crawled based on the cookie shared by this browser.

Flow chart design:

We don't need a detail page for Weibo. Therefore, the entire crawler process does not have a details page, and the data is extracted from the list.

Crawling results:

It took a total of 5 minutes to crawl 10 pages and a total of 400 microblogs. Because I don't post very frequently on Weibo.

Data are as follows:

Make a simple word cloud:

Guess you like

Origin blog.csdn.net/milu2003516/article/details/106208880

Crawler crawls Sina Weibo data

Sina Weibo crawler that crawls tens of millions of data in one hour

Crawling Sina Weibo data with WebCollector

Python timing crawler crawls Weibo hot search data pyecharts dynamic graph display

Python crawler crawls data from mobile APP

Python crawler crawls movie data and performs visualization

Hive log data analysis based on Sina Weibo - project and source code

Practical crawler combat - Weibo comment data visualization

Crawl Sina Weibo

Sina Weibo content scraping

Python crawler senlenium crawls Lagou recruitment data, have you learned

Requests library of Python crawler crawls massive pictures! Data is all money

Python Taobao crawler crawls Taobao product data based on requests

Python crawler crawls and downloads data from American scientific research sites

[Crawler of Practical Tools Series] Python Crawls Information Data

Simulate Sina Weibo automatic login

Python simulates login to Sina Weibo

js movement application Sina Weibo

Crawling Sina Weibo fans with crawlers

The crawler crawls the novel website

The crawler crawls the novel website

The crawler crawls the topic of Zhihu

Python crawls Sina Finance commodity options

Based on massive user behavior data and blog post data analysis of Sina Weibo: including comprehensive index, mobile index and PC index

JAVA crawler crawls the administrative division data of the National Bureau of Statistics (the latest data in 2021)

【APP authorization login】Create Sina Weibo (sina) mobile application

Sina Weibo third-party authorization

Use account password to simulate login to Sina Weibo

Sina Weibo simulates login to obtain cookies

Tencent Weibo is down. Can Sina Weibo survive the midlife crisis?

Recommended

The “35-year-old curse” of Chinese coders

蘭雅 CorelDRAW 插件 2024.5.1 国际劳动节版，免费下载

Ranking

Methods and Practices of Manufacturing Data Quality Improvement

Serial port upgrade program for efficient transmission of large firmware

Use KITTI to run LIOSAM and complete the EVO evaluation

Calling methods on string literals (Java)

ActiveMQ and Spring integration (4)

You have to know the HTTPS! ! !

PAT whether Structures and Algorithms 7-4 with a binary search tree (40 rows streamlined structure)

el-upload brings request background

UVA - 1204 Fun Game-like pressure dp

2019-12-28

Daily

More

2024-05-04(18)

2024-05-03(8)

2024-05-02(0)

2024-05-01(4)

2024-04-30(36)

2024-04-29(5)

2024-04-28(12)

2024-04-27(29)

2024-04-26(22)

2024-04-25(32)