Python uses the pytorch deep learning framework to construct a Transformer neural network model to predict red wine classification examples

1. Red wine data introduction

The classic red wine classification dataset refers to the Wine dataset in the UCI machine learning library. The data set contains 178 samples, each sample has 13 features, which can be used for classification tasks.

The meaning of each field is as follows:
alcohol: alcohol content percentage
malic_acid: malic acid content (g/L)
ash: ash content (g/L)
alcalinity_of_ash: ash alkalinity (in mEq/L)
magnesium: magnesium content ( mg/L)
total_phenols: total phenol content (in mg/L)
flavanoids: flavonoid content (in mg/L)
nonflavanoid_phenols: non-flavonoid phenol content (in mg/L)
proanthocyanins: procyanidin content ( in mg/l)
color_intensity: color intensity (in absorbance, corresponding to relative width at 1cm path length)
hue: hue, i.e. tendency or similarity of color (a number between 1 and 10)
od280/od315_of_diluted_wines: optical density ratio of diluted wine samples, used to measure the concentration of various compounds in wine proline
: proline content (in mg/l), a natural amino acid, related to the quality and taste of wine .

2. Red wine data set analysis

2.1 Load the red wine dataset

# 加载红酒数据集
wineBunch = load_wine()
type(wineBunch)

sklearn.utils.Bunch
sklearn.utils.Bunch is a data container in the Scikit-learn library, similar to a Python dictionary (dictionary),
which can store any number and type of data, and can be accessed using the dot (.) operator data. Bunch is often used to store data sets of machine learning models,
such as data describing feature matrices, associated target vectors, feature names, etc., in order to organize and transfer these data to the model for training or prediction.

2.2 Shape of Red Wine Dataset

len(wineBunch.data),len(wineBunch.target)

(178, 178)

2.3 Print the first 5 lines and the last 5 lines of the red wine data set

featuresDf = pd.DataFrame(data=wineBunch.data, columns=wineBunch.feature_names)   # 特征数据
labelDf = pd.DataFrame(data=wineBunch.target, columns=["target"])               # 标签数据
wineDf = pd.concat([featuresDf, labelDf], axis=1)  # 横向拼接
wineDf.head(5).append(wineDf.tail(5))              # 打印首尾5行

insert image description here

2.4 Column names of red wine dataset

wineDf.columns

Index([‘alcohol’, ‘malic_acid’, ‘ash’, ‘alcalinity_of_ash’, ‘magnesium’,
‘total_phenols’, ‘flavanoids’, ‘nonflavanoid_phenols’,
‘proanthocyanins’, ‘color_intensity’, ‘hue’,
‘od280/od315_of_diluted_wines’, ‘proline’, ‘target’],
dtype=‘object’)

2.5 Target Labels for Red Wine Dataset

print(wineDf.target.unique())
[0 1 2]

3. Transformer classifies red wine

3.1 Transformer Introduction

Transformer is a neural network structure based on attention mechanism, which is mainly used for sequence-to-sequence conversion tasks in the field of natural language processing, such as machine translation, text summarization, etc. It was proposed by Google in 2017 and was successfully used in Google Translate.

The main feature of Transformer is that it uses an encoder-decoder structure based entirely on the attention mechanism, which avoids the long sequence dependency and gradient disappearance problems that exist in traditional recurrent neural networks (such as LSTM). In addition, Transformer also uses techniques such as residual connection and layer normalization to enhance the training effect and generalization ability of the model.

In the Transformer model, both the input sequence and the output sequence are represented as fixed-length vectors, called word vectors, which consist of multiple embedding layers and multiple encoder and decoder layers. Among them, the encoder and decoder layers include modules such as multi-head attention mechanism, feed-forward neural network and residual connection to achieve effective modeling and transformation of sequences.

3.2 Introducing dependent libraries

import random         # 导入 random 模块,用于随机数生成
import torch          # 导入 PyTorch 模块,用于深度学习任务
import numpy as np    # 导入 numpy 模块,用于数值计算
from torch import nn  # 从 PyTorch 中导入神经网络模块
from sklearn import datasets  # 从sklearn引入数据集
from sklearn.model_selection import train_test_split  # 导入 sklearn 库中的 train_test_split 函数,用于数据划分
from sklearn.preprocessing import StandardScaler     # 导入 sklearn 库中的 StandardScaler 类,用于数据标准化

3.3 Set random seed

# 设置随机种子,让模型每次输出的结果都一样
seed_value = 42
random.seed(seed_value)                         # 设置 random 模块的随机种子
np.random.seed(seed_value)                      # 设置 numpy 模块的随机种子
torch.manual_seed(seed_value)                   # 设置 PyTorch 中 CPU 的随机种子
#tf.random.set_seed(seed_value)                 # 设置 Tensorflow 中随机种子
if torch.cuda.is_available():                   # 如果可以使用 CUDA,设置随机种子
    torch.cuda.manual_seed(seed_value)          # 设置 PyTorch 中 GPU 的随机种子
    torch.backends.cudnn.deterministic = True   # 使用确定性算法,使每次运行结果一样
    torch.backends.cudnn.benchmark = False      # 不使用自动寻找最优算法加速运算

3.4 Detect if GPU is available

# 检测GPU是否可用
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

3.5 Loading the dataset

# 加载红酒数据集
wine = datasets.load_wine()
X = wine.data
y = wine.target

3.6 Split training set and test set

# 拆分成训练集和测试集,训练集80%和测试集20%
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)

3.7 Scaling data

# 缩放数据
scaler = StandardScaler() # 创建一个标准化转换器的实例
X_train = scaler.fit_transform(X_train) # 对训练集进行拟合(计算平均值和标准差)
X_test = scaler.transform(X_test) # 对测试集进行标准化转换,使用与训练集相同的平均值和标准差

3.8 Convert to pytorch tensor

# 将训练集转换为 PyTorch 张量,并转换为浮点数类型,如果 GPU 可用,则将张量移动到 GPU 上
X_train = torch.tensor(X_train).float().to(device)
# 将训练集转换为 PyTorch 张量,并转换为长整型,如果 GPU 可用,则将张量移动到 GPU 上
y_train = torch.tensor(y_train).long().to(device)
X_test = torch.tensor(X_test).float().to(device)
y_test = torch.tensor(y_test).long().to(device)

3.9 Define the Transformer model

Define the Transformer model

class TransformerModel(nn.Module):
    def __init__(self, input_size, num_classes):
        super(TransformerModel, self).__init__()
        # 构建Transformer编码层,参数包括输入维度、注意力头数
        # 其中d_model要和模型输入维度相同
        self.encoder_layer = nn.TransformerEncoderLayer(d_model=input_size,  # 输入维度
                                                        nhead=1)             # 注意力头数
        # 构建Transformer编码器,参数包括编码层和层数
        self.encoder = nn.TransformerEncoder(self.encoder_layer,             # 编码层
                                             num_layers=1)                   # 层数
        # 构建线性层,参数包括输入维度和输出维度(num_classes)
        self.fc = nn.Linear(input_size,                                      # 输入维度
                            num_classes)                                     # 输出维度

    def forward(self, x):
        #print("A:", x.shape)  # torch.Size([142, 13])
        x = x.unsqueeze(1)    # 增加一个维度,变成(batch_size, 1, input_size)的形状
        #print("B:", x.shape)  # torch.Size([142, 1, 13])
        x = self.encoder(x)   # 输入Transformer编码器进行编码
        #print("C:", x.shape)  # torch.Size([142, 1, 13])
        x = x.squeeze(1)      # 压缩第1维,变成(batch_size, input_size)的形状
        #print("D:", x.shape)  # torch.Size([142, 13])
        x = self.fc(x)        # 输入线性层进行分类预测
        #print("E:", x.shape)  # torch.Size([142, 3])
        return x
# 初始化Transformer模型,并移动到GPU
model = TransformerModel(input_size=13,             # 输入维度
                         num_classes=3).to(device)  # 输出维度

3.10 Defining loss functions and optimizers

Define loss function and optimizer

criterion = nn.CrossEntropyLoss() # Define loss function - cross entropy loss function

define optimizer

optimizer = torch.optim.Adam(model.parameters(), # model parameters
lr=0.01) # learning rate

3.11 Training Model

# 训练模型
num_epochs = 100     # 训练100轮
for epoch in range(num_epochs):
    # 正向传播:将训练数据放到模型中,得到模型的输出
    outputs = model(X_train)
    loss = criterion(outputs, y_train)  # 计算交叉熵损失

    # 反向传播和优化:清零梯度、反向传播计算梯度,并根据梯度更新模型参数
    optimizer.zero_grad()  # 清零梯度
    loss.backward()        # 反向传播计算梯度
    optimizer.step()       # 根据梯度更新模型参数

    # 每10轮打印一次损失值,查看模型训练的效果
    if (epoch + 1) % 10 == 0:
        print(f'Epoch [{
      
      epoch + 1}/{
      
      num_epochs}], Loss: {
      
      loss.item():.4f}')

3.12 Test Model

# 测试模型,在没有梯度更新的情况下,对测试集进行推断
with torch.no_grad():
    outputs = model(X_test)   # 使用训练好的模型对测试集进行预测
    _, predicted = torch.max(outputs.data, 1)  # 对输出的结果取 argmax,得到预测概率最大的类别
    accuracy = (predicted == y_test).sum().item() / y_test.size(0)  # 计算模型在测试集上的准确率
    print(f'Test Accuracy: {
      
      accuracy:.2f}')   # 打印测试集准确率

3.13 Control output

Epoch [10/100], Loss: 0.1346
Epoch [20/100], Loss: 0.0325
Epoch [30/100], Loss: 0.0116
Epoch [40/100], Loss: 0.0064
Epoch [50/100], Loss: 0.0040
Epoch [60/100], Loss: 0.0029
Epoch [70/100], Loss: 0.0026
Epoch [80/100], Loss: 0.0021
Epoch [90/100], Loss: 0.0019
Epoch [100/100], Loss: 0.0019
Test Accuracy: 1.00

Process finished with exit code 0

Correct rate: 100%

3.14 Save the model

# 保存模型
PATH = "model.pt"
torch.save(model.state_dict(), PATH)

3.15 Loading the model

load model

model = Net()
model.load_state_dict(torch.load(PATH))
model.eval()

Guess you like

Origin blog.csdn.net/programmer589/article/details/130454658