Article directory
1. Preliminary work
- import library package
- Import Data
- Principal Component Analysis (PCA)
- Cluster analysis (K-means)
2. Building the neural network model
3. Testing the model
Hello everyone, I am Weixue AI. Today I will bring you a recognition, classification and prediction of air quality using convolutional neural network (pytorch version).
We know that haze weather is a state of air pollution, and PM2.5 is considered to be the "culprit" of haze weather. The smaller the daily average value of PM2.5, the better the air quality.
The main pollutants for air quality evaluation are fine particulate matter (PM2.5), inhalable particulate matter (PM10), sulfur dioxide (SO2), nitrogen dioxide (NO2), ozone (O3), and carbon monoxide (CO).
Now we have collected weather indicator data for multiple cities, the data samples are as follows:
1. Preliminary work
1. Import library package
import torch
import torch.nn as nn
import torch.utils.data as Data
import numpy as np
import pymysql
import datetime
import csv
import time
import matplotlib.pyplot as plt
import pandas as pd
from sklearn.cluster import KMeans
from sklearn.decomposition import PCA
2. Import data
data =pd.read_csv("weather.csv",encoding='gb18030')
data = data.drop(columns=data.columns[-1])
print(data)
3. Principal Component Analysis (PCA)
There are six variables in the weather data, which increases the difficulty and complexity of the analysis problem, and there is a certain correlation between multiple variables in the data. Therefore, we thought about whether we could replace more old variables with fewer new variables on the basis of correlation analysis. Here we adopted the principle of principal component analysis (PCA). The basic idea of PCA is dimensionality reduction. The following code maps the principle 6 variables into two new variables through changes.
pca = PCA(n_components=2)
new_pca = pd.DataFrame(pca.fit_transform(data))
X = new_pca.values
print(new_pca)
4. Cluster analysis (K-means)
K-means is a clustering analysis algorithm for iterative solution. The main steps are: we will set the grouping class K, the system randomly selects K objects as the initial clustering centers, and then calculates the relationship between each object and each seed clustering center The distance between each object is assigned to the cluster center closest to it. The cluster centers and the objects assigned to them represent a cluster. Each time a sample is assigned, the cluster center of the cluster is recalculated based on the existing objects in the cluster. This process will be repeated until a certain termination condition is met. The clustering algorithm directly calls the KMeans algorithm in sklearn.cluster. When the data does not have category labels, we can use unsupervised learning cluster analysis to label the clusters of each piece of data.
kms = KMeans(n_clusters=6) # 6表示聚类的个数
#获取类别标签
Y= kms.fit_predict(data)
data['class'] = Y
data.to_csv("weather_new.csv",index=False) #保存文件
#绘制聚类发布图
d = new_pca[Y == 0]
plt.plot(d[0], d[1], 'r.')
d = new_pca[Y == 1]
plt.plot(d[0], d[1], 'g.')
d = new_pca[Y == 2]
plt.plot(d[0], d[1], 'b.')
d = new_pca[Y == 3]
plt.plot(d[0], d[1], 'y.')
d = new_pca[Y == 4]
plt.plot(d[0], d[1],'c.')
d = new_pca[Y == 5]
plt.plot(d[0], d[1],'k.')
plt.show()
In the figure, the data is divided into 6 categories with different colors. From the figure, it can be seen intuitively that the data with similar positions are divided into one category.
2. Neural network model establishment
class MyNet(nn.Module):
def __init__(self):
super(MyNet, self).__init__()
self.con1 = nn.Sequential(
nn.Conv1d(in_channels=1, out_channels=64, kernel_size=3, stride=1, padding=1),
nn.MaxPool1d(kernel_size=1),
nn.ReLU(),
)
self.con2 = nn.Sequential(
nn.Conv1d(in_channels=64, out_channels=128, kernel_size=3, stride=1, padding=1),
nn.MaxPool1d(kernel_size=1),
nn.ReLU(),
)
self.fc = nn.Sequential(
# 线性分类器
nn.Linear(128*6*1, 128),
nn.ReLU(),
nn.Linear(128, 6),
# nn.Softmax(dim=1),
)
self.mls = nn.MSELoss()
self.opt = torch.optim.Adam(params=self.parameters(), lr=1e-3)
self.start = datetime.datetime.now()
def forward(self, inputs):
out = self.con1(inputs)
out = self.con2(out)
out = out.view(out.size(0), -1) # 展开成一维
out = self.fc(out)
return out
def train(self, x, y):
out = self.forward(x)
loss = self.mls(out, y)
self.opt.zero_grad()
loss.backward()
self.opt.step()
return loss
def test(self, x):
out = self.forward(x)
return out
def get_data(self):
with open('weather_new.csv', 'r') as f:
results = csv.reader(f)
results = [row for row in results]
results = results[1:1500]
inputs = []
labels = []
for result in results:
# one-hot独热编码
one_hot = [0 for i in range(6)]
index = int(result[6])-1
one_hot[index] = 1
labels.append(one_hot)
input = result[:6]
input = [float(x) for x in input]
inputs.append(input)
inputs = np.array(inputs)
labels = np.array(labels)
inputs = torch.from_numpy(inputs).float()
inputs = torch.unsqueeze(inputs, 1)
labels = torch.from_numpy(labels).float()
return inputs, labels
def get_test_data(self):
with open('weather_new.csv', 'r') as f:
results = csv.reader(f)
results = [row for row in results]
results = results[1500: 1817]
inputs = []
labels = []
for result in results:
label = [result[6]]
input = result[:6]
input = [float(x) for x in input]
label = [float(y) for y in label]
inputs.append(input)
labels.append(label)
inputs = np.array(inputs)
inputs = torch.from_numpy(inputs).float()
inputs = torch.unsqueeze(inputs, 1)
labels = np.array(labels)
labels = torch.from_numpy(labels).float()
return inputs, labels
3. Training model
EPOCH = 100
BATCH_SIZE = 50
net = MyNet()
x_data, y_data = net.get_data()
torch_dataset = Data.TensorDataset(x_data, y_data)
loader = Data.DataLoader(
dataset=torch_dataset,
batch_size=BATCH_SIZE,
shuffle=True,
num_workers=2,
)
for epoch in range(EPOCH):
for step, (batch_x, batch_y) in enumerate(loader):
# print(step)
# print(step,'batch_x={}; batch_y={}'.format(batch_x, batch_y))
a = net.train(batch_x, batch_y)
print('step:',step,a)
# 保存模型
torch.save(net, 'net.pkl')
Start training:
step: 0 tensor(3.6822, grad_fn=<MseLossBackward0>)
step: 1 tensor(61.2186, grad_fn=<MseLossBackward0>)
step: 2 tensor(18.2877, grad_fn=<MseLossBackward0>)
step: 3 tensor(4.3641, grad_fn=<MseLossBackward0>)
step: 4 tensor(6.7846, grad_fn=<MseLossBackward0>)
step: 5 tensor(9.4255, grad_fn=<MseLossBackward0>)
step: 6 tensor(5.4232, grad_fn=<MseLossBackward0>)
step: 7 tensor(4.1342, grad_fn=<MseLossBackward0>)
step: 8 tensor(2.0944, grad_fn=<MseLossBackward0>)
step: 9 tensor(1.4549, grad_fn=<MseLossBackward0>)
step: 10 tensor(0.9372, grad_fn=<MseLossBackward0>)
step: 11 tensor(1.0688, grad_fn=<MseLossBackward0>)
step: 12 tensor(0.6717, grad_fn=<MseLossBackward0>)
step: 13 tensor(0.6158, grad_fn=<MseLossBackward0>)
step: 14 tensor(0.6889, grad_fn=<MseLossBackward0>)
step: 15 tensor(0.5306, grad_fn=<MseLossBackward0>)
step: 16 tensor(0.5781, grad_fn=<MseLossBackward0>)
step: 17 tensor(0.3959, grad_fn=<MseLossBackward0>)
step: 18 tensor(0.4629, grad_fn=<MseLossBackward0>)
step: 19 tensor(0.3646, grad_fn=<MseLossBackward0>)
4. Check the model
# 加载模型
net = torch.load('net.pkl')
x_data, y_data = net.get_test_data()
torch_dataset = Data.TensorDataset(x_data, y_data)
loader = Data.DataLoader(
dataset=torch_dataset,
batch_size=100,
shuffle=False,
num_workers=1,
)
num_success = 0
num_sum = 317
for step, (batch_x, batch_y) in enumerate(loader):
# print(step)
output = net.test(batch_x)
# output = output.detach().numpy()
y = batch_y.detach().numpy()
for index, i in enumerate(output):
i = i.detach().numpy()
i = i.tolist()
j = i.index(max(i))
print('输出为{}标签为{}'.format(j+1, y[index][0]))
loss = j+1-y[index][0]
if loss == 0.0:
num_success += 1
print('正确率为{}'.format(num_success/num_sum))
Output result:
....
输出为3标签为3.0
输出为4标签为4.0
输出为5标签为5.0
输出为1标签为1.0
输出为3标签为3.0
输出为3标签为3.0
输出为3标签为3.0
输出为4标签为4.0
正确率为0.9495268138801262
The model prediction result is 94.95%;
the field "fcm" in the data is classified, indicating the category of the data:
number 1 means: air quality: excellent;
number 2 means: air quality: good;
number 3 means: air quality: light pollution;
number 4 means: air quality: moderate pollution, smog
Number 5 means: air quality: severe pollution, smog
Number 6 means: air quality: severe pollution, smog
Private message me to get the data set! There will be more exciting and in-depth actual combat content in the later stage, so stay tuned!
Previous works:
Deep
Learning Practical Projects
1. Deep learning practice 1-(keras framework) enterprise data analysis and prediction
2. Deep learning practice 2-(keras framework) enterprise credit rating and prediction
3. Deep Learning Practice 3-Text Convolutional Neural Network (TextCNN) News Text Classification
5. Deep Learning Practice 5-Convolutional Neural Network (CNN) Chinese OCR Recognition Project
7. Deep learning practice 7-Sentiment analysis of e-commerce product reviews
8. Deep Learning Combat 8-Life Photo Transformation Comic Photo Application
9. Deep learning practice 9-text generation image-local computer realizes text2img
12. Deep Learning Practice 12 (Advanced Edition) - Using Dewarp to Correct Text Distortion
16. Deep learning practice 16 (advanced version) - virtual screenshot recognition text - can do paper contract and form recognition
17. Deep Learning Practice 17 (Advanced Edition) - Construction and Development Case of Intelligent Assistant Editing Platform System
18. Deep Learning Combat 18 (Advanced Edition) - 15 tasks of NLP fusion system, which can realize the NLP tasks you can think of on the market
19. Deep Learning Practice 19 (Advanced Edition)-ChatGPT's local implementation deployment test, ChatGPT can be implemented on your own platform
…(pending upgrade)