Python based on Flask university public opinion analysis, public opinion monitoring visualization system

Table of contents

I. Introduction

2. Use Python to crawl public opinion data

1. Install requests library

2. Analyze data

3. Crawl data

3. Improve data crawling efficiency through proxy IP

1. Get proxy IP

2. Use proxy IP

4. Use the Flask framework to implement a public opinion monitoring visualization system

5. Use MongoDB to store data

6. Summary


I. Introduction


In today's society, public opinion monitoring is paid more and more attention. With the development of Internet technology, the public opinion information we collect from traditional media channels, official reports, questionnaires, etc. is gradually replaced by content on the Internet. Because the content on the Internet spreads quickly, is timely, and covers a wide range, it has become an important way for managers, enterprises, and governments to understand the public sentiment and grasp market trends.

This article introduces how to write a university public opinion analysis and public opinion monitoring visualization system using Python language based on the Flask framework. The following mainly involves 5 aspects:

  1. How to use Python to crawl public opinion data;
  2. How to improve data crawling efficiency through proxy IP;
  3. How to use the Flask framework to implement a public opinion monitoring visualization system;
  4. How to use MongoDB to store data;
  5. How to use ECharts to visualize data.

2. Use Python to crawl public opinion data


There are two main ways to crawl public opinion data. One is to directly use the API interface and obtain the corresponding data by calling the API. Another way is to use Python to crawl the data on the website.

This article introduces the second method of data acquisition, taking crawling of the Chinese University Ranking Network as an example.

1. Install requests library

To use Python to crawl website data, you first need to install the requests library. The requests library is an HTTP client library in Python that can simulate HTTP requests, send requests, and receive responses. Use the following command to install:

!pip install requests
2. Analyze data

Before crawling data, we need to analyze the data. Open the China University Rankings website, click "University Rankings" -> "Global Rankings", the website link is: http://www.zuihaodaxue.com/ARWU2020.html.

From the website we can see that the displayed data is roughly as follows:

The data we need to obtain are listed as "ranking", "school name", "region", and "total score".

3. Crawl data

After analyzing the data, we can start crawling the data. First, we need to import the requests library and BeautifulSoup library.

import requests
from bs4 import BeautifulSoup

Next, we need to set the request headers and request parameters. Here we set them as follows:

headers = {
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.3'}
params = {
    'from': 'hao360',
    'ie': 'utf-8',
    'query': 'python'}

Among them, headers are request headers, used to tell the server our identity information, and params are request parameters, indicating that the "python" keyword is to be searched.

Next, we use the requests library to send a request, obtain the web page content, and parse the required data.

url = 'http://www.zuihaodaxue.com/ARWU2020.html'

response = requests.get(url, headers=headers)
response.encoding = response.apparent_encoding

soup = BeautifulSoup(response.text, 'html.parser')

all_university = soup.findAll('tr', {'class': 'bgfd'})
for university in all_university:
    rank = university.find('td', {'align': 'center'}).getText()
    name = university.find('a').getText()
    region = university.find('div', {'style': 'padding-left:10px;'}).getText().strip()
    score = university.findAll('td', {'align': 'center'})[-1].getText()
    print(rank, name, region, score)

In this way, we can obtain the rankings, school names, regions, and total score data of all universities.

However, it should be noted that if you crawl the website directly, your IP may be blocked. The next section will introduce how to improve the efficiency of data crawling through proxy IP.

3. Improve data crawling efficiency through proxy IP

When we crawl data, if you frequently visit the same website, it may be detected, resulting in the IP being blocked and unable to access normally. At this time, we can use proxy IP to avoid this problem. Using proxy IP for data crawling can better protect our real IP and achieve better results.

1. Get proxy IP

There are many proxy IP providers on the Internet. We can solve the problem of blocked IP by purchasing proxy IP. Here, we are using the free IP provided by the free Zdaye proxy IP (https://www.zdaye.com/).

On the Uncle Station IP website, we can get the following information:

  • IP address
  • The port number
  • area
  • Anonymity
  • type
  • survival time
  • Verification time

What we need to use is the IP address and port number. By adding them to the request header, we can use the proxy IP to crawl data.

2. Use proxy IP

The way to use proxy IP is very simple, just add the proxy IP to the request header. For example, the following code uses the proxy IP provided by the website agent to crawl data:

import requests

url = 'http://www.zuihaodaxue.com/ARWU2020.html'

proxies = {'http': 'http://111.177.190.36:9999', 'https': 'https://111.177.190.36:9999'}
headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.3'}

response = requests.get(url, headers=headers, proxies=proxies)

print(response.text)

Here we set a proxy IP in the format http://IP:port. When sending a request, add the proxy IP to the request header through the proxies parameter, and you can use the proxy IP to crawl data.

4. Use the Flask framework to implement a public opinion monitoring visualization system

Flask is a lightweight Python web framework for writing web-based applications. It is ideal for small applications and simple web services, but can also serve as the core for large-scale applications.

The Flask framework includes request distribution, template rendering, data access and other functions, and is very suitable for developing web applications and APIs.

When using the Flask framework to build a public opinion monitoring visualization system, we need to install the Flask and pymongo (used to connect to the MongoDB database) library, and use the following code to create a Flask application:

import json
from flask import Flask, render_template
from pymongo import MongoClient

app = Flask(__name__)

@app.route('/')
def index():
    client = MongoClient('localhost', 27017)
    db = client['university']
    collection = db['ARWU']
    data_list = []
    for data in collection.find():
        del data['_id']
        data_list.append(data)
    return render_template('index.html', data_list=json.dumps(data_list, ensure_ascii=False))

if __name__ == '__main__':
    app.run()

Among them, localhost represents the host name where the MongoDB database is located, and 27017 represents the port number of the MongoDB database. In addition, we can also use the request library to obtain the data transmitted from the front end, for example:

from flask import request

@app.route('/api/search', methods=['GET'])
def search():
    keyword = request.args.get('keyword')
    client = MongoClient('localhost', 27017)
    db = client['university']
    collection = db['ARWU']
    data_list = []
    for data in collection.find({'name': {'$regex': keyword}}):
        del data['_id']
        data_list.append(data)
    return json.dumps(data_list, ensure_ascii=False)

When using the Flask framework, we need to create a templates folder to store html files, as shown below:![templates](https://CS0waW1nLmNvbS9BdWxuZXdzL2RlZmF1bHRfc3RvcmUuanBn)

In the templates folder, we need to create an index.html file to display data. The specific code is as follows:

<!DOCTYPE html>
<html>
<head>
    <meta charset="UTF-8">
    <title>中国大学排名</title>
    <script src="https://cdn.jsdelivr.net/npm/echarts/dist/echarts.min.js"></script>
    <style>
        /* 设置容器大小 */
        #main {
            height: 600px;
        }
    </style>
</head>
<body>
<!-- 设置一个容器用于展示数据 -->
<div id="main"></div>
<!-- 使用JavaScript渲染表格 -->
<script type="text/javascript">
    // 获取后端传输的数据
    var data = JSON.parse({
   
   {data_list}});
    // 初始化echarts图表
    var myChart = echarts.init(document.getElementById('main'));

    // 配置图表参数
    var option = {
        tooltip: {},
        legend: {
            data: ['总分']
        },
        xAxis: {
            data: data.map(function (item) {
                return item.name;
            })
        },
        yAxis: {},
        series: [{
            name: '总分',
            type: 'bar',
            data: data.map(function (item) {
                return item.score;
            })
        }]
    };

    // 使用刚指定的配置项和数据显示图表。
    myChart.setOption(option);
</script>
</body>
</html>

Here, we use the ECharts library (https://echarts.apache.org/) to achieve data visualization display. Finally, run the app.py file in the command line to start the Flask application.

5. Use MongoDB to store data

In this example, we use MongoDB as data storage. MongoDB is a non-relational database. Compared with relational databases, MongoDB is more flexible, has better scalability, and supports massive data storage.

In Python, we can use the pymongo library to connect and operate MongoDB. The specific code is as follows:

from pymongo import MongoClient

client = MongoClient('localhost', 27017)
db = client['university']
collection = db['ARWU']

data = {'rank': '1', 'name': 'Harvard University', 'region': 'USA', 'score': '100'}
collection.insert_one(data)

result = collection.find({'region': 'USA'})
for data in result:
    print(data)

In the above code, we first connect to MongoDB and select the database and collection to operate on. Then, we insert a piece of data and query the data with specified conditions through the find method.

6. Summary

This article introduces how to use Python to crawl public opinion data and improve data crawling efficiency by using proxy IP. At the same time, we also learned how to use the Flask framework to build a public opinion monitoring visualization system and use MongoDB to store data.

There are still many areas that need to be improved in this public opinion monitoring visualization system, such as how to update data in real time, how to improve the interactivity of data visualization display, etc. I hope readers can conduct further exploration and practice on this basis.

 

Guess you like

Origin blog.csdn.net/wq10_12/article/details/132851558