How to use pytorch to define a multi-layer perceptual neural network model—expanding to all model knowledge

# 导入必要的库
import torch
import torch.nn as nn
import torch.optim as optim
from torch.utils.data import DataLoader, random_split
import torchvision.transforms as transforms
import torchvision.datasets as datasets

# 定义MLP模型
class MLP(nn.Module):
    def __init__(self):
        super(MLP, self).__init__()
        # 创建一个顺序的层序列:包括一个扁平化层、两个全连接层和ReLU激活
        self.layers = nn.Sequential(
            nn.Flatten(),                       # 将28x28的图像扁平化为784维向量
            nn.Linear(28 * 28, 512),            # 第一个全连接层,784->512
            nn.ReLU(),                          # ReLU激活函数
            nn.Linear(512, 256),                # 第二个全连接层,512->256
            nn.ReLU(),                          # ReLU激活函数
            nn.Linear(256, 10)                  # 第三个全连接层,256->10 (输出10个类别)
        )
        
    def forward(self, x):
        return self.layers(x)                   # 定义前向传播

# 加载FashionMNIST数据集
# 定义图像的预处理:转换为Tensor并标准化
transform = transforms.Compose([transforms.ToTensor(), transforms.Normalize((0.5,), (0.5,))])
# 下载FashionMNIST数据并应用转换
dataset = datasets.FashionMNIST(root="./data", train=True, transform=transform, download=True)

# 划分数据集为训练集和验证集
train_len = int(0.8 * len(dataset))           # 计算80%的长度作为训练数据
val_len = len(dataset) - train_len            # 剩下的20%作为验证数据
train_dataset, val_dataset = random_split(dataset, [train_len, val_len])

# 创建数据加载器
train_loader = DataLoader(train_dataset, batch_size=64, shuffle=True)  # 训练数据加载器,批量大小64,打乱数据
val_loader = DataLoader(val_dataset, batch_size=64, shuffle=False)     # 验证数据加载器,批量大小64,不打乱

# 初始化模型、损失函数和优化器
model = MLP()                                 # 创建MLP模型实例
criterion = nn.CrossEntropyLoss()             # 定义交叉熵损失函数
optimizer = optim.Adam(model.parameters(), lr=0.001)  # 使用Adam优化器

# 训练模型
epochs = 5                                    # 定义训练5个epochs
for epoch in range(epochs):
    model.train()                             # 将模型设置为训练模式
    for inputs, labels in train_loader:       # 从训练加载器中获取批次数据
        outputs = model(inputs)               # 前向传播
        loss = criterion(outputs, labels)     # 计算损失
        optimizer.zero_grad()                 # 清除之前的梯度
        loss.backward()                       # 反向传播,计算梯度
        optimizer.step()                      # 更新权重
        
    # 在每个epoch结束时验证模型性能
    model.eval()                              # 将模型设置为评估模式
    total_correct = 0
    with torch.no_grad():                     # 不计算梯度,节省内存和计算量
        for inputs, labels in val_loader:     # 从验证加载器中获取批次数据
            outputs = model(inputs)           # 前向传播
            _, predicted = outputs.max(1)     # 获取预测的类别
            total_correct += (predicted == labels).sum().item()  # 统计正确的预测数量
    accuracy = total_correct / val_len        # 计算验证准确性
    print(f"Epoch {
      
      epoch + 1}/{
      
      epochs} - Validation accuracy: {
      
      accuracy:.4f}")  # 打印验证准确性

nn.Flatten() is a special layer that "flattens" multi-dimensional input data into one-dimensional data. This is especially common when working with image data, as images are often multi-dimensional (e.g. a grayscale image of size 28x28 would have a tensor of shape [28, 28] in PyTorch).

Before certain layers of neural networks, especially fully connected layers (such as nn.Linear), it is often necessary to flatten the data. Because a fully connected layer expects its input to be one-dimensional (or more precisely, it expects the last dimension of the input to correspond to features and the other dimensions to correspond to batches of data).

To be more specific, let's look at an example:

Consider a tensor of size [batch_size, 28, 28], this can be thought of as a batch_size number of batches of 28x28 images. When we pass this batch of images to a nn.Linear(28*28, 512) layer, we need to flatten the images first. That is, each 28x28 image needs to be converted into a one-dimensional vector of length 784. Therefore, the shape of the input data will change from [batch_size, 28, 28] to [batch_size, 784].

nn.Flatten() does this conversion. In this particular example, it will convert the shape of [batch_size, 28, 28] to [batch_size, 784].

To summarize: nn.Flatten() is used to convert multi-dimensional input data into one dimension so that it can be used as input to a fully connected layer (such as nn.Linear).

  • transforms.Compose:
    This is a simple way to chain (combine) multiple image transformation operations. It performs each transformation in the list in the order provided.

  • transforms.ToTensor():
    This transformation converts a PIL image or NumPy ndarray to a FloatTensor. And it changes the pixel value range of the image from 0-255 to 0-1. In short, it completes the conversion of data types and value ranges for us.

  • transforms.Normalize((0.5,), (0.5,)):
    This transformation normalizes the tensor image. The given parameters are mean and standard deviation. Here, the mean and standard deviation are both 0.5.
    Using the given mean and standard deviation, this converts the value range from [0,1] to [-1,1].

The purpose of the entire transform is:

  • Convert image data from PIL format to PyTorch tensor format.
  • Converts pixel values ​​from the [0,255] range to the [0,1] range.
  • Further normalizes pixel values ​​using the given mean and standard deviation so that they are in the range [-1,1].

Initialize model, loss function and optimizer

  • model = MLP():

    • Here we instantiate the MLP class we defined earlier, thereby creating a Multilayer Perceptron (MLP) model.
  • criterion = nn.CrossEntropyLoss():

    • In classification tasks, cross-entropy loss function (CrossEntropyLoss) is one of the most commonly used loss functions. It measures the difference between true labels and predictions.
    • Note: CrossEntropyLoss performs a softmax operation internally, so the model output should be raw scores (logits) without softmax processing.
  • optimizer = optim.Adam(model.parameters(), lr=0.001):

    • The optimizer is responsible for updating the weights of the model to reduce the loss based on the calculated gradients.
    • Adam is a popular optimizer that combines two extensions of stochastic gradient descent: Adaptive Gradients and Momentum.
    • model.parameters() is passed to the optimizer, which tells the optimizer which weights should be optimized/updated.
    • lr=0.001 defines the learning rate, which is a hyperparameter that indicates the step size of each weight update.

Frequently asked questions and answers

  1. Model (in torch.nn):

In addition to basic MLP, PyTorch provides many predefined layers and models, common ones include:

Convolutional Neural Networks (CNNs):
    nn.Conv2d: 2D卷积层,常用于图像处理。
    nn.Conv3d: 3D卷积层,常用于视频处理或医学图像。
    nn.MaxPool2d: 最大池化层。

Recurrent Neural Networks (RNNs):
    nn.RNN: 基本的RNN层。
    nn.LSTM: 长短时记忆网络。
    nn.GRU: 门控循环单元。

Transformer Architecture:
    nn.Transformer: 用于自然语言处理任务的Transformer模型。

Batch Normalization, Dropout等:
    nn.BatchNorm2d: 批量归一化。
    nn.Dropout: 防止过拟合的正则化方法。
  1. Loss function (in torch.nn):

Common loss functions are:

Classification:
    nn.CrossEntropyLoss: 用于分类任务的交叉熵损失。
    nn.BCEWithLogitsLoss: 用于二分类任务的二元交叉熵损失,包括内部的sigmoid操作。
    nn.MultiLabelSoftMarginLoss: 用于多标签分类任务。

Regression:
    nn.MSELoss: 均方误差,用于回归任务。
    nn.L1Loss: L1误差。

Generative models:
    nn.KLDivLoss: Kullback-Leibler散度,常用于生成模型。
  1. Optimizer (in torch.optim):

Common optimizers are:

optim.SGD: 随机梯度下降。
optim.Adam: 一个非常受欢迎的优化器,结合了AdaGrad和RMSProp的特点。
optim.RMSprop: 常用于深度学习任务。
optim.Adagrad: 自适应学习率优化器。
optim.Adadelta: 类似于Adagrad,但试图解决其快速降低学习率的问题。
optim.AdamW: Adam的变种,加入了权重衰减。

Insert image description here

One word per text

Learning is continuous development

Guess you like

Origin blog.csdn.net/weixin_47723732/article/details/133910921