Know the reptiles-take the hot search of Tianjin University as an example to crawl bad reviews

When I woke up, I found a hot search in the sky. How could Tianjin University be my alma mater? How could it be so slanderous, let alone, just roll up your sleeves in the morning!

At first I wanted to traverse the Page , and I tried to pull it. It was Ajax asynchronous (PS: asynchronous rendering, not the kind of page turning), okay!

Directly grab the package , the following API caught my eye.

Open it and take a look, the proper json data (anonymous users are useless, the id number is unique, as long as you want to check, you can check it against the Zhihu database)

According to the offset framed in the above figure, the traversal can be directly constructed for crawling

Of course, although the lovely Zhihu uses asynchronous Ajax, but in the end, he left a page , and also intimately told me the total number , no need to construct it yourself.

Not much to say, you can directly traverse to grab all the answers, posting time, posting content, conditionally engage in a sentiment analysis or something, you can see my previous blog for details, I will not put Chinese sentiment analysis here. The source code is now available, and those who will know everything at this point.

Everyone has to take responsibility for their words and deeds. The Internet is not illegal .

Guess you like

Origin blog.csdn.net/weixin_40539952/article/details/107440633