AlexNet-pytorch builds a flower classifier

1. AlexNet network structure

Two, model.py

Detailed explanation of building a network

1.nn.Sequential()

Different from demo1, the nn.Sequential() used here can pack a series of structures into a new structure. The main advantage of using this structure is that it can simplify the code

2. The padding parameter in the pytorch convolution function Conv2d()

padding = 1 means that the top, bottom, left, and right are each filled with a row and a column of 0

padding = (2, 1) means that the upper and lower rows are each filled with two rows of zeros, and the left and right columns are each filled with 1 column of 0s

To achieve higher-order padding, you can use the nn.Zero.Pad2d() method, for example, nn.Zero.Pad2d((1, 2, 1, 2)) means that one line is added above, two lines are added below, and the left is filled One column, two columns to the right

In fact, when the padding is completed and the convolution operation is performed, if it cannot be divisible, pytorch will discard the columns and rows on the right and below

3.nn.ReLU(inplace = True)

It can be understood as a method to increase the amount of calculation and reduce memory

4.nn.Dropout()

Reduce the number of parameters to prevent overfitting, where the parameters represent the ratio of random inactivation

5.isinstance()

Determines whether an object is of a known type

model.py code

import torch.nn as nn
import torch
 
class AlexNet(nn.Module):
    def __init__(self, num_classes=1000, init_weights=False):
        super(AlexNet, self).__init__()
        # 与demo1不同,这里使用了nn.Sequential 可以将一系列的结构打包生成新的结构, 用这种结构可以简化代码
        # 定义features用来专门提取图片特征
        self.features = nn.Sequential(
            nn.Conv2d(3, 48, kernel_size=11, padding=2),
            nn.ReLU(inplace=True),
            nn.MaxPool2d(kernel_size=3, stride=2),
            nn.Conv2d(48, 128, kernel_size=5, padding=2),
            nn.ReLU(inplace=True),
            nn.MaxPool2d(kernel_size=3, padding=2),
            nn.Conv2d(128, 192, kernel_size=3, padding=1),
            nn.ReLU(inplace=True),
            nn.Conv2d(192, 192, kernel_size=3, padding=1),
            nn.ReLU(inplace=True),
            nn.Conv2d(192, 128, kernel_size=3, padding=1),
            nn.ReLU(inplace=True),
            nn.MaxPool2d(kernel_size=3, stride=2),
        )
        self.classifier = nn.Sequential(
            nn.Dropout(0.5),  # 使用nn.Dropout()防止过拟合
            nn.Linear(128 * 6 * 6, 2048),
            nn.ReLU(inplace=True),
            nn.Dropout(0.5),
            nn.Linear(2048, 2048),
            nn.ReLU(inplace=True),
            nn.Linear(2048, num_classes),
        )
        if init_weights:
            self._initialize_weights()
    def forward(self, x):
        x = self.features(x)
        x = torch.flatten(x, start_dim=1)   # start_dim表示从第一维也即channel那一维进行展平
        self.classifier(x)
        return x
    # 初始化函数 isinstance判断当前层是卷积层还是全连接层
    # 分别对不同的情况进行初始化
    def _initialize_weights(self):
        for m in self.modules():
            if isinstance(m, nn.Conv2d):
                nn.init.kaiming_normal_(m.weight, mode='fan_out')
                if m.bias is not None:
                    nn.init.constant_(m.bias, 0)
            elif isinstance(m, nn.Linear):
                nn.init.normal_(m.weight, 0, 0.01)
                nn.init.constant_(m.bias, 0)

3. train.py

1.device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

Function: If there is a GPU, call the GPU training, if not, continue to use the CPU training

2.dataset = data.ImageFolder(root ,transform)

root is the image root directory, transform preprocessing function, the returned dataset has three attributes

1) dataset.classes: save the category name with a list

2) dataset.class_to_idx: The index corresponding to the category, corresponding to the category, is a dictionary

3) dataset: save the list of (img-path, class) tuple

print(dataset.classes)  #根据分的文件夹的名字来确定的类别
print(dataset.class_to_idx) #按顺序为这些类别定义索引为0,1...
print(dataset.imgs) #返回从所有文件夹中得到的图片的路径以及其类别
 
输出:
['cat',  'dog']
{'cat':  0, 'dog':  1}
[('./data/train\\cat\\1.jpg',  0), 
 ('./data/train\\cat\\2.jpg', 0), 
 ('./data/train\\dog\\1.jpg', 1), 
 ('./data/train\\dog\\2.jpg', 1)]

3

json_str = json.dumps(cla_list, indent=4)
with open('class_indices.json', 'w') as json_file:
        json_file.write(json_str)

Function: Encode the dictionary into json format and save it, indent means four characters before each category

4.torchvision.utils.make_grid(images, padding=0)

Function: stitch multiple pictures into one picture

padding = 0

padding = 5

5.net .train() and net.eval() methods to manage the use of dropout

There are two modes in pytorch, train() mode and eval() mode, which are used for training and verification respectively. Generally, these two modes are the same, and only the existence of dropout and batchnorm makes a difference.

During training, if BN and dropout exist in the model, net.train() needs to be added during training to enable batch normalization and dropout.

model.train() is to ensure that the BN layer can use the mean and variance of each batch of data. For Dropout, model.train() randomly selects a part of network connections to train and update parameters.

6.

rate = (step + 1) / len(train_loader)
a = "*" * int(rate * 50)
b = "." * int((1-rate) * 50)
print("\rtrainloss:{:^3.0f}%[{}->{}]{:.3f}".format(int(rate * 100),a,b,loss),end="")

Function: training progress bar display

\r can make the statement after r be printed directly, so that you don't have to worry about the problem of escape characters.

train.py

import torch
import torch.nn as nn
from torchvision import transforms, datasets, utils
import matplotlib.pyplot as plt
import numpy as np
import torch.optim as optim
from model import AlexNet
import os
import time
import json
os.environ["KMP_DUPLICATE_LIB_OK"] = "TRUE"
# 用来指定训练设备
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
print(device)
# 数据预处理
data_transform = {
    "train": transforms.Compose([transforms.RandomResizedCrop(224),  # 随机裁剪,将图片裁剪到224 * 224
                                 transforms.RandomHorizontalFlip(),  # 随机翻转
                                 transforms.ToTensor(),  # 数据归一化
                                 transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))]),  # 标准化
    "val": transforms.Compose([transforms.Resize((224, 224)),
                               transforms.ToTensor(),
                               transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))])}
# 获取数据
# 先获取当前脚本目录 与"../.."拼接后,返回上上层的绝对路径 也就是数据所在的目录
data_root = os.path.abspath(os.path.join(os.getcwd(), "../.."))
image_root = data_root + "/data_set/flower_data"
train_dataset = datasets.ImageFolder(root=image_root + "/train",
                                     transform=data_transform["train"])
train_num = len(train_dataset)
# 获取索引值
flower_list = train_dataset.class_to_idx
cla_list = dict((val, key) for key, val in flower_list.items())
# write dic into json file
# indent表示前面间隔的长度
json_str = json.dumps(cla_list, indent=4)
with open('class_indices.json', 'w') as json_file:
    json_file.write(json_str)
batch = 32
train_loader = torch.utils.data.DataLoader(train_dataset, batch_size=batch, shuffle=True, num_workers=0)
validate_dataset = datasets.ImageFolder(root=image_root + "/val",
                                        transform=data_transform["val"])
val_num = len(validate_dataset)
val_loader = torch.utils.data.DataLoader(validate_dataset, batch_size=batch, shuffle=False, num_workers=0)
# code to view test set image
test_data_iter = iter(val_loader)
test_image, test_label = test_data_iter.next()
# # 展示图片函数
# def imshow(img):
#     img = img / 2 + 0.5  # 相等于unnormalize
#     npimg = img.numpy()
#     # 这里是因为之前transforms.ToTensor()转变为tensor数据将其归一化时改变了通道顺序,这里改回了[H, W, C]
#     plt.imshow(np.transpose(npimg, (1, 2, 0)))
#     plt.show()
#
#
# # print labels
# print(' '.join('%7s' % cla_list[test_label[j].item()] for j in range(4)))
# # show image
# imshow(utils.make_grid(test_image, padding=0))
# 建立实例网络
net = AlexNet(num_classes=5, init_weights=True)
# 指定网络使用GPU或者cpu
net.to(device)
# 定义损失函数
loss_function = nn.CrossEntropyLoss()
# define 优化器
optimizer = optim.Adam(net.parameters(), lr=0.0002)
# 保存模型
save_path = './AlexNet_gpu.pth'
# 用来保存准确率最高的模型
best_acc = 0
for epoch in range(10):
    # train
    net.train()
    # running用来统计训练过程中的平均损失
    running_loss = 0
    t1 = time.perf_counter()
    for step, data in enumerate(train_loader, start=0):
        images, labels = data
        optimizer.zero_grad()
        output = net(images.to(device))
        loss = loss_function(output, labels.to(device))
        loss.backward()
        optimizer.step()
        # 打印信息
        running_loss += loss.item()
        # 打印训练进程
        rate = (step + 1) / len(train_loader)
        a = "*" * int(rate * 50)
        b = "." * int((1-rate) * 50)
        print("\rtrain loss: {:^3.0f}%[{}->{}] {:.3f}".format(int(rate * 100), a, b, loss), end="")
    print()
    print(time.perf_counter() - t1)
    # valid
    net.eval()
    acc = 0.0
    with torch.no_grad():
        for data_set in val_loader:
            test_images, test_labels = data_set
            outputs = net(test_images.to(device))
            predict_y = torch.max(outputs, dim=1)[1]
            acc += (predict_y == test_labels.to(device)).sum().item()
        accurate = acc / val_num
        if accurate > best_acc:
            best_acc = accurate
            torch.save(net.state_dict(), save_path)
        print('[epoch %d] train_loss:%.3f test_accuracy: %.3f' %
              (epoch + 1, running_loss / step, acc / val_num))
print("Finished Training")

4. predict.py

1.torch.unsqueeze()

Function: increase the dimension, because tensor is a four-dimensional tensor

2.

with torch.no_grad():
    output = torch.squeeze(net(im))
    predict = torch.softmax(output, dim=0)
    predict_cla = torch.argmax(predict).numpy()
print(class_indict[str(predict_cla)], predict[predict_cla].item())

This code is basically:

1) First use torch.no_grad() to disable gradient calculation

2) torch.squeeze discards a dimension

3) Calculate the output with the softmax function

4) Determine the maximum value of the value obtained by the softmax function, and return the index converted to numpy type

5) Print the corresponding category in the dictionary, and print the predicted probability at the same time, .item() means to take a scalar value

predict.py

import torch
import torchvision.transforms as transforms
from PIL import Image
from model import AlexNet
import matplotlib.pyplot as plt
import json
import os
os.environ["KMP_DUPLICATE_LIB_OK"] = "TRUE"
transform = transforms.Compose([transforms.Resize((224, 224)),
                                transforms.ToTensor(),
                                transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))])
 
im = Image.open('test.jpg')
plt.imshow(im)
im = transform(im)
im = torch.unsqueeze(im, dim=0)
try:
    json_file = open('./class_indices.json', 'r')
    class_indict = json.load(json_file)
except Exception as e:
    print(e)
    exit(-1)
net = AlexNet(num_classes=5)
net.load_state_dict(torch.load('AlexNet_gpu.pth'))
net.eval()
with torch.no_grad():
    output = torch.squeeze(net(im))
    predict = torch.softmax(output, dim=0)
    predict_cla = torch.argmax(predict).numpy()
print(class_indict[str(predict_cla)], predict[predict_cla].item())
plt.show()

Treasure blogger video link:

3.2 Use pytorch to build AlexNet and train the flower classification data set_哔哩哔哩_bilibili

Guess you like

Origin blog.csdn.net/SL1029_/article/details/129098330