Deep Learning Practice 6- Convolutional Neural Network (Pytorch) + Cluster Analysis to Realize Air Quality and Weather Prediction

Article directory

1. Preliminary work

  1. import library package
  2. Import Data
  3. Principal Component Analysis (PCA)
  4. Cluster analysis (K-means)

2. Building the neural network model
3. Testing the model

Hello everyone, I am Weixue AI. Today I will bring you a recognition, classification and prediction of air quality using convolutional neural network (pytorch version).
We know that haze weather is a state of air pollution, and PM2.5 is considered to be the "culprit" of haze weather. The smaller the daily average value of PM2.5, the better the air quality.
The main pollutants for air quality evaluation are fine particulate matter (PM2.5), inhalable particulate matter (PM10), sulfur dioxide (SO2), nitrogen dioxide (NO2), ozone (O3), and carbon monoxide (CO).
insert image description here

Now we have collected weather indicator data for multiple cities, the data samples are as follows:
insert image description here

1. Preliminary work

1. Import library package

import torch
import torch.nn as nn
import torch.utils.data as Data
import numpy as np
import pymysql
import datetime
import csv
import time
import matplotlib.pyplot as plt
import pandas as pd
from sklearn.cluster import KMeans
from sklearn.decomposition import PCA

2. Import data

data =pd.read_csv("weather.csv",encoding='gb18030')
data = data.drop(columns=data.columns[-1])
print(data)

3. Principal Component Analysis (PCA)

There are six variables in the weather data, which increases the difficulty and complexity of the analysis problem, and there is a certain correlation between multiple variables in the data. Therefore, we thought about whether we could replace more old variables with fewer new variables on the basis of correlation analysis. Here we adopted the principle of principal component analysis (PCA). The basic idea of ​​PCA is dimensionality reduction. The following code maps the principle 6 variables into two new variables through changes.

pca = PCA(n_components=2)
new_pca = pd.DataFrame(pca.fit_transform(data))
X = new_pca.values
print(new_pca)

4. Cluster analysis (K-means)

K-means is a clustering analysis algorithm for iterative solution. The main steps are: we will set the grouping class K, the system randomly selects K objects as the initial clustering centers, and then calculates the relationship between each object and each seed clustering center The distance between each object is assigned to the cluster center closest to it. The cluster centers and the objects assigned to them represent a cluster. Each time a sample is assigned, the cluster center of the cluster is recalculated based on the existing objects in the cluster. This process will be repeated until a certain termination condition is met. The clustering algorithm directly calls the KMeans algorithm in sklearn.cluster. When the data does not have category labels, we can use unsupervised learning cluster analysis to label the clusters of each piece of data.

kms = KMeans(n_clusters=6)  # 6表示聚类的个数
#获取类别标签
Y= kms.fit_predict(data)
data['class'] = Y
data.to_csv("weather_new.csv",index=False) #保存文件

#绘制聚类发布图
d = new_pca[Y == 0]
plt.plot(d[0], d[1], 'r.')
d = new_pca[Y == 1]
plt.plot(d[0], d[1], 'g.')
d = new_pca[Y == 2]
plt.plot(d[0], d[1], 'b.')
d = new_pca[Y == 3]
plt.plot(d[0], d[1], 'y.')
d = new_pca[Y == 4]
plt.plot(d[0], d[1],'c.')
d = new_pca[Y == 5]
plt.plot(d[0], d[1],'k.')
plt.show()

insert image description here
In the figure, the data is divided into 6 categories with different colors. From the figure, it can be seen intuitively that the data with similar positions are divided into one category.

2. Neural network model establishment

class MyNet(nn.Module):
    def __init__(self):
        super(MyNet, self).__init__()
        self.con1 = nn.Sequential(
            nn.Conv1d(in_channels=1, out_channels=64, kernel_size=3, stride=1, padding=1),
            nn.MaxPool1d(kernel_size=1),
            nn.ReLU(),
        )
        self.con2 = nn.Sequential(
            nn.Conv1d(in_channels=64, out_channels=128, kernel_size=3, stride=1, padding=1),
            nn.MaxPool1d(kernel_size=1),
            nn.ReLU(),
        )
        self.fc = nn.Sequential(
            # 线性分类器
            nn.Linear(128*6*1, 128),  
            nn.ReLU(),
            nn.Linear(128, 6),
            # nn.Softmax(dim=1),
        )
        self.mls = nn.MSELoss()
        self.opt = torch.optim.Adam(params=self.parameters(), lr=1e-3)
        self.start = datetime.datetime.now()

    def forward(self, inputs):
        out = self.con1(inputs)
        out = self.con2(out)
        out = out.view(out.size(0), -1)  # 展开成一维
        out = self.fc(out)
        return out

    def train(self, x, y):
        out = self.forward(x)
        loss = self.mls(out, y)
        self.opt.zero_grad()
        loss.backward()
        self.opt.step()

        return loss

    def test(self, x):
        out = self.forward(x)
        return out

    def get_data(self):
        with open('weather_new.csv', 'r') as f:
            results = csv.reader(f)
            results = [row for row in results]
            results = results[1:1500]
        inputs = []
        labels = []
        for result in results:
            # one-hot独热编码
            one_hot = [0 for i in range(6)]
            index = int(result[6])-1
            one_hot[index] = 1
            labels.append(one_hot)
            input = result[:6]
            input = [float(x) for x in input]
            
            inputs.append(input)
        
        inputs = np.array(inputs)
        labels = np.array(labels)
        inputs = torch.from_numpy(inputs).float()
        inputs = torch.unsqueeze(inputs, 1)

        labels = torch.from_numpy(labels).float()
        return inputs, labels

    def get_test_data(self):
        with open('weather_new.csv', 'r') as f:
            results = csv.reader(f)
            results = [row for row in results]
            results = results[1500: 1817]
        inputs = []
        labels = []
        for result in results:
            label = [result[6]]
            input = result[:6]
            input = [float(x) for x in input]
            label = [float(y) for y in label]
            inputs.append(input)
            labels.append(label)
        inputs = np.array(inputs)
        
        inputs = torch.from_numpy(inputs).float()
        inputs = torch.unsqueeze(inputs, 1)
        labels = np.array(labels)
        labels = torch.from_numpy(labels).float()
        return inputs, labels

3. Training model

EPOCH = 100
BATCH_SIZE = 50

net = MyNet()
x_data, y_data = net.get_data()
torch_dataset = Data.TensorDataset(x_data, y_data)
loader = Data.DataLoader(
        dataset=torch_dataset,
        batch_size=BATCH_SIZE,
        shuffle=True,
        num_workers=2,
    )
for epoch in range(EPOCH):
        for step, (batch_x, batch_y) in enumerate(loader):
            # print(step)
            # print(step,'batch_x={};  batch_y={}'.format(batch_x, batch_y))
            a = net.train(batch_x, batch_y)
            print('step:',step,a)
 # 保存模型
torch.save(net, 'net.pkl')

Start training:

step: 0 tensor(3.6822, grad_fn=<MseLossBackward0>)
step: 1 tensor(61.2186, grad_fn=<MseLossBackward0>)
step: 2 tensor(18.2877, grad_fn=<MseLossBackward0>)
step: 3 tensor(4.3641, grad_fn=<MseLossBackward0>)
step: 4 tensor(6.7846, grad_fn=<MseLossBackward0>)
step: 5 tensor(9.4255, grad_fn=<MseLossBackward0>)
step: 6 tensor(5.4232, grad_fn=<MseLossBackward0>)
step: 7 tensor(4.1342, grad_fn=<MseLossBackward0>)
step: 8 tensor(2.0944, grad_fn=<MseLossBackward0>)
step: 9 tensor(1.4549, grad_fn=<MseLossBackward0>)
step: 10 tensor(0.9372, grad_fn=<MseLossBackward0>)
step: 11 tensor(1.0688, grad_fn=<MseLossBackward0>)
step: 12 tensor(0.6717, grad_fn=<MseLossBackward0>)
step: 13 tensor(0.6158, grad_fn=<MseLossBackward0>)
step: 14 tensor(0.6889, grad_fn=<MseLossBackward0>)
step: 15 tensor(0.5306, grad_fn=<MseLossBackward0>)
step: 16 tensor(0.5781, grad_fn=<MseLossBackward0>)
step: 17 tensor(0.3959, grad_fn=<MseLossBackward0>)
step: 18 tensor(0.4629, grad_fn=<MseLossBackward0>)
step: 19 tensor(0.3646, grad_fn=<MseLossBackward0>)

4. Check the model

# 加载模型
net = torch.load('net.pkl')
x_data, y_data = net.get_test_data()
torch_dataset = Data.TensorDataset(x_data, y_data)
 loader = Data.DataLoader(
        dataset=torch_dataset,
        batch_size=100,
        shuffle=False,
        num_workers=1,
    )
num_success = 0
num_sum = 317
for step, (batch_x, batch_y) in enumerate(loader):
     # print(step)
     output = net.test(batch_x)
     # output = output.detach().numpy()
     y = batch_y.detach().numpy()
     for index, i in enumerate(output):
         i = i.detach().numpy()
         i = i.tolist()
         j = i.index(max(i))
         print('输出为{}标签为{}'.format(j+1, y[index][0]))
         loss = j+1-y[index][0]
         if loss == 0.0:
             num_success += 1
     print('正确率为{}'.format(num_success/num_sum))

Output result:

....
输出为3标签为3.0
输出为4标签为4.0
输出为5标签为5.0
输出为1标签为1.0
输出为3标签为3.0
输出为3标签为3.0
输出为3标签为3.0
输出为4标签为4.0
正确率为0.9495268138801262

The model prediction result is 94.95%;
the field "fcm" in the data is classified, indicating the category of the data:
number 1 means: air quality: excellent;
number 2 means: air quality: good;
number 3 means: air quality: light pollution;
number 4 means: air quality: moderate pollution, smog
Number 5 means: air quality: severe pollution, smog
Number 6 means: air quality: severe pollution, smog
Private message me to get the data set! There will be more exciting and in-depth actual combat content in the later stage, so stay tuned!

Previous works:
​Deep
Learning Practical Projects

1. Deep learning practice 1-(keras framework) enterprise data analysis and prediction

2. Deep learning practice 2-(keras framework) enterprise credit rating and prediction

3. Deep Learning Practice 3-Text Convolutional Neural Network (TextCNN) News Text Classification

4. Deep Learning Combat 4 - Convolutional Neural Network (DenseNet) Mathematical Graphics Recognition + Topic Pattern Recognition

5. Deep Learning Practice 5-Convolutional Neural Network (CNN) Chinese OCR Recognition Project

6. Deep Learning Combat 6- Convolutional Neural Network (Pytorch) + Cluster Analysis to Realize Air Quality and Weather Prediction

7. Deep learning practice 7-Sentiment analysis of e-commerce product reviews

8. Deep Learning Combat 8-Life Photo Transformation Comic Photo Application

9. Deep learning practice 9-text generation image-local computer realizes text2img

10. Deep learning practice 10-mathematical formula recognition-converting pictures to Latex (img2Latex)

11. Deep learning practice 11 (advanced version) - fine-tuning application of BERT model - text classification case

12. Deep Learning Practice 12 (Advanced Edition) - Using Dewarp to Correct Text Distortion

13. Deep learning practice 13 (advanced version) - text error correction function, good luck for friends who often write typos

14. Deep learning practice 14 (advanced version) - handwritten text OCR recognition, handwritten notes can also be recognized

15. Deep Learning Combat 15 (Advanced Edition) - Let the machine do reading comprehension + you can become a question maker and ask questions

16. Deep learning practice 16 (advanced version) - virtual screenshot recognition text - can do paper contract and form recognition

17. Deep Learning Practice 17 (Advanced Edition) - Construction and Development Case of Intelligent Assistant Editing Platform System

18. Deep Learning Combat 18 (Advanced Edition) - 15 tasks of NLP fusion system, which can realize the NLP tasks you can think of on the market

19. Deep Learning Practice 19 (Advanced Edition)-ChatGPT's local implementation deployment test, ChatGPT can be implemented on your own platform

…(pending upgrade)

Supongo que te gusta

Origin blog.csdn.net/weixin_42878111/article/details/124732091
Recomendado
Clasificación