【Data Analysis】-Based on pneumonia news text data analysis

For the specific content of the article, see: [Data Analysis] Data Analysis Based on News Text

data collection

      From the memory of New Coronary Pneumonia in 2020: Reports, Non-Fiction and Personal Narrative (Continuous Update) , the data is obtained from this article. My method is to use crawlers to get the article page, get the list of articles listed and the original text link, and then pass Link to get the specific content of the article. As of 2020-2-17, I got 1351 links. The analysis shows that these links are mainly from: WeChat public account, financial network, economic observation network, Fangfang blog, China business network topic, interface network, among 1351 links The number of articles from these websites is 1324, and the rest are only 27, less than 2%, and are discarded directly. So the crawler was used to obtain the content of the article from the corresponding website and organize it into excel.

                            

 

Published 314 original articles · 22 praises · 20,000+ views

Guess you like

Origin blog.csdn.net/qq_39451578/article/details/105450536