Based on massive user behavior data and blog post data analysis of Sina Weibo: including comprehensive index, mobile index and PC index

Based on massive user behavior data and blog post data analysis of Sina Weibo: including comprehensive index, mobile index and PC index

  • Project Introduction

    1. The micro index is an index product that reflects the development status of different event fields based on massive user behavior data and blog post data, and is statistically obtained by scientific calculation methods.
    2. For the included keywords, the micro-index provides index data at the microblog data level in terms of index, including three indexes: comprehensive index, mobile index, and PC index.
  • Project Example
    Take the keyword 'ZTE' as an example, and request to obtain three index data of ZTE. The data collection time of the micro-index has a range, and the scope is as follows:
    1) Overall trend: 2013-03-01-present
    2) Mobile trend: 2014-01-06-present
    3) PC trend: 2014-01-06-
    present The example sets start_date = '2016-05-29', end_date = '2018-05-29', the original result is as follows:

1. Raw Composite Index

2. Raw mobile/pc index

  • Implementation process
  '''主函数'''
    def index_main(self, word, start_date, end_date):
        # 打开数据页面
        print('step1, open page....')3
        driver = self.search_index(word)
        # 构造请求,获取指数json数据
        print('step2, get data....')
        data = self.get_data(driver, start_date, end_date)
        # 判断数据返回类型,若微博没有收录改词,则退出,显示退出信息
        if data['zt']:
            print('step3, save data ...')
            self.output_data(word, data)
            print('finished....')
        else:
            print('not be record...')
        #关闭浏览器对象
        driver.close()
  • implement
 def demo():
        start_date = '2016-05-29'
        end_date = '2018-05-29'
        sina = SinaIndex()
        search_word = '中兴'
        sina.index_main(search_word, start_date, end_date)
    demo()

3. Effect display

The obtained data files are visualized locally, and the effect is as follows:

3.1 Composite index

3.2 Mobile Index

3.3 PC index

3.4 Index Comparison

5. Summary

1. The difficulty of collecting micro-indices is between that of Baidu Index and Ali Index. There are two characteristics: 1) The index is generated by js dynamic request, which can be obtained by constructing the request and analyzing it. 2) No user login required.
2. The dates included in the micro-index are wider than that of the Ali index, and narrower than that of the Baidu index. However, based on the data obtained at the level of Weibo, there are still some new ideas for related research.

For the project code source, see the top or end of the article

https://download.csdn.net/download/sinat_39620217/88000970

Guess you like

Origin blog.csdn.net/sinat_39620217/article/details/131968759