pytorch advanced learning (6): how to optimize and verify the trained model and visualize the training process such as accuracy and loss value, novice-friendly super detailed records

Course resources: 

7. Visualization of model verification and training process [Pytorch for primary school students] [source code provided]_哔哩哔哩_bilibili

It is recommended to eat with the notes in the previous section~:

pytorch advanced learning (five): a detailed introduction to the nanny level of neural network migration learning applications, how to replace the trained model with the model you need - Programmer Sought

  • Training and testing datasets: data (5 classes)
  • Verification set: testdata (more than 20 pictures were randomly extracted from the data dataset)
  • Pre-trained network and weight file: use resnet34 pre-trained weight file, the download address is as follows
https://download.pytorch.org/models/resnet34-333f7ec4.pth

Table of contents

1. Generate dataset CreateDataset.py

1. Code

2. Running results 

2. Pre-training model PreTrainedModel.py

1. Download the pre-trained weight file

2. Use the transfer learning method to modify the resnet34 neural network framework and load the pre-trained weights

3. Model optimization 

3.1 Storage and output of model process data

3.2 Training process

3.3 Testing process

3.4 Running results

4. Code

3. Model Validation

1. Import model structure

2. Load model parameters

3. Load image

4. Verification process

5. Get the result

6. Complete code

 4. Visualization

1. Code

2. Draw graphics 


1. Generate dataset CreateDataset.py

 Generate training set and test set, respectively saved in tes.txt, train.txt and eval.txt files; equivalent to the input of the model. When doing the data loader dataload later, read the data from it.

  • test.txt, train.txt: save the image path and label of the test set and training set
  • eval.txt: the path to save the image data of the validation set

1. Code

'''
    生成训练集和测试集,保存在txt文件中
'''
##相当于模型的输入。后面做数据加载器dataload的时候从里面读他的数据
import os
import random#打乱数据用的

def CreateTrainingSet():
    # 百分之80用来当训练集
    train_ratio = 0.8

    # 用来当测试集
    test_ratio = 1-train_ratio

    rootdata = r"data"#数据的根目录

    train_list, test_list = [],[]#读取里面每一类的类别
    data_list = []

    #生产train.txt和test.txt
    class_flag = -1
    for a,b,c in os.walk(rootdata):
        print(a)
        for i in range(len(c)):
            data_list.append(os.path.join(a,c[i]))

        for i in range(0,int(len(c)*train_ratio)):
            train_data = os.path.join(a, c[i])+'\t'+str(class_flag)+'\n'
            train_list.append(train_data)

        for i in range(int(len(c) * train_ratio),len(c)):
            test_data = os.path.join(a, c[i]) + '\t' + str(class_flag)+'\n'
            test_list.append(test_data)

        class_flag += 1

    print(train_list)
    random.shuffle(train_list)#打乱次序
    random.shuffle(test_list)

    with open('train.txt','w',encoding='UTF-8') as f:
        for train_img in train_list:
            f.write(str(train_img))

    with open('test.txt','w',encoding='UTF-8') as f:
        for test_img in test_list:
            f.write(test_img)

def CreateEvalData():
    data_list = []
    test_root = r"testdata"
    for a, b, c in os.walk(test_root):
        for i in range(len(c)):
            data_list.append(os.path.join(a, c[i]))
    print(data_list)
    with open('eval.txt', 'w', encoding='UTF-8') as f:
        for test_img in data_list:
            f.write(test_img + '\t' + "0" + '\n')

if __name__ == "__main__":
    CreateEvalData()
    CreateTrainingSet()

2. Running results 

        You can see that 3 TXT files are generated.

        Each line in the eval.txt file consists of a picture path and 0. A 0 is added after the picture to unify the format of train.txt and test.txt with the path in front and the label in the back, so that the information in TXT can be extracted uniformly later. 

 

2. Pre-training model PreTrainedModel.py

1. Download the pre-trained weight file

 Download the resnet34 pre-training parameters from the corresponding URL, modify the file to resnet34_pretrain.pth, and save it in the project file.

2. Use the transfer learning method to modify the resnet34 neural network framework and load the pre-trained weights

  1.  The data set we use is 5 categories, the output of the FC layer of the fully connected layer should be 5, and the output of the fc layer of the resnet neural network we built ourselves is 1000 (using a data set with 1000 categories for training), so The output of the fc layer needs to be changed to 5;
  2. Delete the fc layer parameters of the resnet34 pre-training weight file;
  3. Load the weight parameters in the network built by yourself, and update the weights in the network;
  4. Freeze all layers except the fc layer to prepare for the separate training of the fc layer parameters;
  5. Use the loss function and the gradient descent algorithm to train the parameters of the fc layer;

See the notes for details: pytorch advanced learning (5): a detailed introduction to the nanny level of neural network transfer learning applications, how to replace the trained model with the model you need

3. Model optimization 

3.1 Storage and output of model process data

Set epoch=50, during training:

  •  In each round of epoch, the loss value during the training process and the accuracy and average loss during the test process will be saved , and the records will be saved in the file named mobilenet_36_traindata.txt;
  • Every 10 epochs will save the weight parameters in the resnet_epoch_xx_acc_xx.pth file, the corresponding epoch number and accuracy in the file name; we have 50 epochs, so we will save 5 such files, such as resnet_epoch_10_acc_xx.pth, resnet_epoch_20_acc_xx.pth wait;
  • If the acc of an epoch is higher than the previous one, save a BEST_resnet_epoch_xx_acc_xx.pth file to record the current maximum accuracy.
# 一共训练50次
    epochs = 50
    best = 0.0
    for t in range(epochs):
        print(f"Epoch {t + 1}\n-------------------------------")
        train_loss = train(train_dataloader, model, loss_fn, optimizer)
        accuracy, avg_loss = test(test_dataloader, model)
        # 记录训练过程值,写入mobilenet_36_traindata.txt文件进行保存
        write_result("mobilenet_36_traindata.txt", t+1, train_loss, avg_loss, accuracy)

#10个 epoch保存一次resnet_epoch_xx_acc_xx.pth文件
        if (t+1) % 10 == 0:
            torch.save(model.state_dict(), "resnet_epoch_"+str(t+1)+"_acc_"+str(accuracy)+".pth")

        # 如果一个epoch的acc比上一个要高,就保存一个BEST_resnet_epoch_xx_acc_xx.pth文件,记录当前最高的准确率
        if float(accuracy) > best:
            best = float(accuracy)
            torch.save(model.state_dict(), "BEST_resnet_epoch_" + str(t+1) + "_acc_" + str(accuracy) + ".pth")

3.2 Training process

In the train method we will return the average loss of a batch of batchsize data.

def train(dataloader, model, loss_fn, optimizer):
    size = len(dataloader.dataset)
    avg_total = 0.0
    # 从数据加载器中读取batch(一次读取多少张,即批次数),X(图片数据),y(图片真实标签)。
    for batch, (X, y) in enumerate(dataloader):
        # 将数据存到显卡
        X, y = X.cuda(), y.cuda()
        # 得到预测的结果pred
        pred = model(X)
        # 计算预测的误差
        loss = loss_fn(pred, y)
        avg_total = avg_total+loss.item()

        # 反向传播,更新模型参数
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()

        # 每训练10次,输出一次当前信息
        if batch % 10 == 0:
            loss, current = loss.item(), batch * len(X)
#这行代码的作用是在训练模型时输出当前的loss值和训练进度。
#其中,loss值会被格式化为浮点数,current表示当前已经训练的样本数,size表示总的样本数。
#输出的格式为"loss:(loss值][[current/{size]”。其中,“>“表示右对齐,数字表示输出的最小宽度。
            print(f"loss: {loss:>5f}  [{current:>5d}/{size:>5d}]")

    # 定义平均损失
    avg_loss = f"{(avg_total % batch_size):>5f}"
    return avg_loss

3.3 Testing process

The test function returns the accuracy and loss values ​​of the test set data

def test(dataloader, model):
    size = len(dataloader.dataset)
    # 将模型转为验证模式
    model.eval()
    # 初始化test_loss 和 correct, 用来统计每次的误差
    test_loss, correct = 0, 0
    # 测试时模型参数不用更新,所以no_gard()
    # 非训练, 推理期用到
    with torch.no_grad():
        # 加载数据加载器,得到里面的X(图片数据)和y(真实标签)
        for X, y in dataloader:
            # 将数据转到GPU
            X, y = X.cuda(), y.cuda()
            # 将图片传入到模型当中就,得到预测的值pred
            pred = model(X)
            # 计算预测值pred和真实值y的差距
            test_loss += loss_fn(pred, y).item()
            # 统计预测正确的个数
            correct += (pred.argmax(1) == y).type(torch.float).sum().item()
    test_loss /= size
    correct /= size
    accuracy = f"{(100*correct):>0.1f}"
    avg_loss = f"{test_loss:>8f}"
    print(f"correct = {correct}, Test Error: \n Accuracy: {accuracy}%, Avg loss: {avg_loss} \n")
    # 增加数据写入功能
    return accuracy, avg_loss

3.4 Running results

  • epoch=50, you need to wait patiently for a while for the training to end. It can be seen that the parameter file at the beginning of BEST is generated, and the acc is increasing in each round. It can be seen that the group of epochs with the highest accuracy is the 50th group, acc=87.1%, and this group of parameters can be selected as the neural network later. weights to validate the model .

  •  Generated epoch weight files for 10/20/30/40/50

  •  Mobilenet_36_traindata.txt is generated, which stores the training information of each epoch during the training process.

4. Code

'''
    纪录训练信息,包括:
    1. train loss
    2. test loss
    3. test accuracy
'''
import torch
from torch import nn
from torch.utils.data import DataLoader
from torchvision.models import resnet34
from utils import LoadData, write_result

def train(dataloader, model, loss_fn, optimizer):
    size = len(dataloader.dataset)
    avg_total = 0.0
    # 从数据加载器中读取batch(一次读取多少张,即批次数),X(图片数据),y(图片真实标签)。
    for batch, (X, y) in enumerate(dataloader):
        # 将数据存到显卡
        X, y = X.cuda(), y.cuda()
        # 得到预测的结果pred
        pred = model(X)
        # 计算预测的误差
        loss = loss_fn(pred, y)
        avg_total = avg_total+loss.item()

        # 反向传播,更新模型参数
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()

        # 每训练10次,输出一次当前信息
        if batch % 10 == 0:
            loss, current = loss.item(), batch * len(X)
            print(f"loss: {loss:>5f}  [{current:>5d}/{size:>5d}]")

    # 定义平均损失
    avg_loss = f"{(avg_total % batch_size):>5f}"
    return avg_loss

def test(dataloader, model):
    size = len(dataloader.dataset)
    # 将模型转为验证模式
    model.eval()
    # 初始化test_loss 和 correct, 用来统计每次的误差
    test_loss, correct = 0, 0
    # 测试时模型参数不用更新,所以no_gard()
    # 非训练, 推理期用到
    with torch.no_grad():
        # 加载数据加载器,得到里面的X(图片数据)和y(真实标签)
        for X, y in dataloader:
            # 将数据转到GPU
            X, y = X.cuda(), y.cuda()
            # 将图片传入到模型当中就,得到预测的值pred
            pred = model(X)
            # 计算预测值pred和真实值y的差距
            test_loss += loss_fn(pred, y).item()
            # 统计预测正确的个数
            correct += (pred.argmax(1) == y).type(torch.float).sum().item()
    test_loss /= size
    correct /= size
    accuracy = f"{(100*correct):>0.1f}"
    avg_loss = f"{test_loss:>8f}"
    print(f"correct = {correct}, Test Error: \n Accuracy: {accuracy}%, Avg loss: {avg_loss} \n")
    # 增加数据写入功能
    return accuracy, avg_loss

if __name__ == '__main__':
    batch_size = 32

    # # 给训练集和测试集分别创建一个数据集加载器
    train_data = LoadData("train.txt", True)
    valid_data = LoadData("test.txt", False)

    train_dataloader = DataLoader(dataset=train_data, num_workers=4, pin_memory=True, batch_size=batch_size, shuffle=True)
    test_dataloader = DataLoader(dataset=valid_data, num_workers=4, pin_memory=True, batch_size=batch_size)

    # 如果显卡可用,则用显卡进行训练
    device = "cuda" if torch.cuda.is_available() else "cpu"
    print(f"Using {device} device")

    '''
            修改ResNet34模型的最后一层
    '''
    pretrain_model = resnet34(pretrained=False)
    num_ftrs = pretrain_model.fc.in_features    # 获取全连接层的输入
    pretrain_model.fc = nn.Linear(num_ftrs, 5)  # 全连接层改为不同的输出

    # 预先训练好的参数, 'https://download.pytorch.org/models/resnet34-333f7ec4.pth'
    pretrained_dict = torch.load('./resnet34_pretrain.pth')

    # # 弹出fc层的参数
    pretrained_dict.pop('fc.weight')
    pretrained_dict.pop('fc.bias')

    # # 自己的模型参数变量,在开始时里面参数处于初始状态,所以很多0和1
    model_dict = pretrain_model.state_dict()

    # # 去除一些不需要的参数
    pretrained_dict = {k: v for k, v in pretrained_dict.items() if k in model_dict}

    # # 模型参数列表进行参数更新,加载参数
    model_dict.update(pretrained_dict)

    # 改进过的预训练模型结构,加载刚刚的模型参数列表
    pretrain_model.load_state_dict(model_dict)

    '''
        冻结部分层
    '''
    # 将满足条件的参数的 requires_grad 属性设置为False
    for name, value in pretrain_model.named_parameters():
        if (name != 'fc.weight') and (name != 'fc.bias'):
            value.requires_grad = False
    #
    # filter 函数将模型中属性 requires_grad = True 的参数选出来
    params_conv = filter(lambda p: p.requires_grad, pretrain_model.parameters())    # 要更新的参数在parms_conv当中

    model = pretrain_model.to(device)

    # 定义损失函数,计算相差多少,交叉熵,
    loss_fn = nn.CrossEntropyLoss()

    '''   控制优化器只更新需要更新的层  '''
    optimizer = torch.optim.SGD(params_conv, lr=1e-3)  # 初始学习率
    #
    # 一共训练50次
    epochs = 50
    best = 0.0
    for t in range(epochs):
        print(f"Epoch {t + 1}\n-------------------------------")
        train_loss = train(train_dataloader, model, loss_fn, optimizer)
        accuracy, avg_loss = test(test_dataloader, model)
        # 记录训练过程值,写入mobilenet_36_traindata.txt文件进行保存
        write_result("mobilenet_36_traindata.txt", t+1, train_loss, avg_loss, accuracy)

#10个 epoch保存一次resnet_epoch_xx_acc_xx.pth文件
        if (t+1) % 10 == 0:
            torch.save(model.state_dict(), "resnet_epoch_"+str(t+1)+"_acc_"+str(accuracy)+".pth")

        # 如果一个epoch的acc比上一个要高,就保存一个BEST_resnet_epoch_xx_acc_xx.pth文件,记录当前最高的准确率
        if float(accuracy) > best:
            best = float(accuracy)
            torch.save(model.state_dict(), "BEST_resnet_epoch_" + str(t+1) + "_acc_" + str(accuracy) + ".pth")

    print("Train PyTorch Model Success!")

3. Model Validation

Use our trained neural network to perform data validation on the pictures in the validation set.

1. Import model structure

Define the resnet34 network that we modified the output of the fc layer.

'''
    1. 导入模型结构
    '''
    # 设置自己的模型
    model = resnet34(pretrained=False)
    num_ftrs = model.fc.in_features    # 获取全连接层的输入
    model.fc = nn.Linear(num_ftrs, 5)  # 全连接层改为不同的输出
    device = "cuda" if torch.cuda.is_available() else "cpu"
    print(f"Using {device} device")

2. Load model parameters

Use the weight file of a set of parameters with the highest training accuracy, my name is "./BEST_resnet_epoch_50_acc_87.1.pth", load the parameters into the neural network, and then convert the model to cuda;

'''
    2. 加载模型参数
    '''
    # 调用最好的acc的一组参数权重
    model_loc = "./BEST_resnet_epoch_50_acc_87.1.pth"
    model_dict = torch.load(model_loc)
    model.load_state_dict(model_dict)
    # 把模型转换到cuda中
    model = model.to(device)

3. Load image

Use LoadData and DataLoader to load the images in the validation set.

 '''
    3. 加载图片
    '''
    # 加载验证集中的图片
    valid_data = LoadData("eval.txt", train_flag=False)
    test_dataloader = DataLoader(dataset=valid_data, num_workers=2, pin_memory=True, batch_size=1)

4. Verification process

Store the predicted label and probability of each picture in the verification data set in the two lists of label_list and likelihood_list

def eval(dataloader, model):
    label_list = []
    likelihood_list = []
    model.eval()
    with torch.no_grad():
        # 加载数据加载器,得到里面的X(图片数据)和y(真实标签)
        for X, y in dataloader:
            # 将数据转到GPU
            X = X.cuda()
            # 将图片传入到模型当中就,得到预测的值pred
            pred = model(X)
            # 获取可能性最大的标签
            label = torch.softmax(pred,1).cpu().numpy().argmax()
            label_list.append(label)
            # 获取可能性最大的值(即概率)
            likelihood = torch.softmax(pred,1).cpu().numpy().max()
            likelihood_list.append(likelihood)
        return label_list,likelihood_list

5. Get the result

Convert the label number in the label list into the corresponding category text, use pandas to draw the list, output the category and probability of each picture, and save the table in a csv file .

 '''
    4. 获取结果
    '''
    #
    label_list, likelihood_list =  eval(test_dataloader, model)
    label_names = ["daisy", "dandelion","rose","sunflower","tulip"]

    result_names = [label_names[i] for i in label_list]

    list = [result_names, likelihood_list]
    df = pd.DataFrame(data=list)
    df2 = pd.DataFrame(df.values.T, columns=["label", "likelihood"])
    print(df2)
    # 使用pandas把预测结果保存
    df2.to_csv('testdata.csv', encoding='gbk')

The result of pycharm console output: 

Prediction table saved in testdata.csv file: 

6. Complete code

'''
    1.单幅图片验证
    2.多幅图片验证
'''
import torch
from torch import nn
from torch.utils.data import DataLoader
from torchvision.models import resnet34
from utils import LoadData, write_result
import pandas as pd


def eval(dataloader, model):
    label_list = []
    likelihood_list = []
    model.eval()
    with torch.no_grad():
        # 加载数据加载器,得到里面的X(图片数据)和y(真实标签)
        for X, y in dataloader:
            # 将数据转到GPU
            X = X.cuda()
            # 将图片传入到模型当中就,得到预测的值pred
            pred = model(X)


            # 获取可能性最大的标签
            label = torch.softmax(pred,1).cpu().numpy().argmax()
            label_list.append(label)
            # 获取可能性最大的值(即概率)
            likelihood = torch.softmax(pred,1).cpu().numpy().max()
            likelihood_list.append(likelihood)
        return label_list,likelihood_list


if __name__ == "__main__":

    '''
    1. 导入模型结构
    '''
    # 设置自己的模型
    model = resnet34(pretrained=False)
    num_ftrs = model.fc.in_features    # 获取全连接层的输入
    model.fc = nn.Linear(num_ftrs, 5)  # 全连接层改为不同的输出
    device = "cuda" if torch.cuda.is_available() else "cpu"
    print(f"Using {device} device")

    '''
    2. 加载模型参数
    '''
    # 调用最好的acc的一组参数权重
    model_loc = "./BEST_resnet_epoch_50_acc_87.1.pth"
    model_dict = torch.load(model_loc)
    model.load_state_dict(model_dict)
    # 把模型转换到cuda中
    model = model.to(device)

    '''
    3. 加载图片
    '''
    # 加载验证集中的图片
    valid_data = LoadData("eval.txt", train_flag=False)
    test_dataloader = DataLoader(dataset=valid_data, num_workers=2, pin_memory=True, batch_size=1)


    '''
    4. 获取结果
    '''
    #
    label_list, likelihood_list =  eval(test_dataloader, model)
    label_names = ["daisy", "dandelion","rose","sunflower","tulip"]

    result_names = [label_names[i] for i in label_list]

    list = [result_names, likelihood_list]
    df = pd.DataFrame(data=list)
    df2 = pd.DataFrame(df.values.T, columns=["label", "likelihood"])
    print(df2)
    # 使用pandas把预测结果保存
    df2.to_csv('testdata.csv', encoding='gbk')

 4. Visualization

Use the mobilenet_36_traindata.txt file we saved in the previous training process, which stores the accuracy acc and loss functions TrainLoss, TestLoss and TestAccuracy of each epoch during the training process

1. Code

import matplotlib.pyplot as plt
import numpy as np

# 画图表

def getdata(data_loc):
    epoch_list = []
    train_loss_list = []
    test_loss_list = []
    acc_list = []
    with open(data_loc, "r") as f:
        for i in f.readlines():
            data_i = i.split("\t")
            epoch_i = float(data_i[0][7:])
            train_loss_i = float(data_i[1][10:])
            test_loss_i = float(data_i[2][9:])
            acc_i = float(data_i[3][13:])
            epoch_list.append(epoch_i)
            train_loss_list.append(train_loss_i)
            test_loss_list.append(test_loss_i)
            acc_list.append(acc_i)
        print(len(epoch_list), len(train_loss_list))
        return epoch_list, train_loss_list, test_loss_list, acc_list



if __name__ == "__main__":
    data_loc = r"mobilenet_36_traindata.txt"
    epoch_list, train_loss_list, test_loss_list, acc_list = getdata(data_loc)

    # #train_loss
    # plt.plot(epoch_list, train_loss_list)
    #
    # plt.legend(["model"])
    # plt.xticks(np.arange(0, 50, 5))  # 横坐标的值和步长
    # plt.yticks(np.arange(0, 100, 10))  # 横坐标的值和步长
    # plt.xlabel("Epoch")
    # plt.ylabel("train_loss")
    # plt.title("Train Loss")
    # plt.show()

    # 准确率曲线
    # plt.plot(epoch_list, acc_list)
    #
    # plt.legend(["model"])
    # plt.xticks(np.arange(0, 50, 5))  # 横坐标的值和步长
    # plt.yticks(np.arange(0, 100, 10))  # 横坐标的值和步长
    # plt.xlabel("Epoch")
    # plt.ylabel("Accurancy(100%)")
    # plt.title("Model Accuracy")
    # plt.show()

    # test_loss
    plt.plot(epoch_list, test_loss_list)

    plt.legend(["model"])
    plt.xticks(np.arange(0, 50, 5))  # 横坐标的值和步长
    plt.yticks(np.arange(0, 1, 10))  # 横坐标的值和步长
    plt.xlabel("Epoch")
    plt.ylabel("test_loss(100%)")
    plt.title("Test Loss")
    plt.show()

2. Draw graphics 

  • The accuracy rate acc curve drawn:

  •  The drawn train loss curve:

  •   The drawn test loss curve: (the ordinate is from 0 to 1)

Guess you like

Origin blog.csdn.net/weixin_45662399/article/details/130121114