AlexNet network model construction and training

1. Detailed explanation of AlexNet network

AlexNet is the champion network of the ISLVRC 2012 (ImageNet Large Scale Visual Recognition Challenge) competition in 2012, and its classification accuracy has increased from the traditional 70%+ to 80%+. It was designed by Hinton and his student Alex Krizhevsky. It was also after that year that deep learning began to develop rapidly.

insert image description here

The highlights of the network are:

(1) For the first time, GPU is used for network acceleration training.

(2) The ReLU activation function is used instead of the traditional Sigmoid activation function and Tanh activation function.

(3) LRN local response normalization is used.

(4) Dropout random deactivation neuron operation is used in the first two layers of the fully connected layer to reduce overfitting.

1. ReLU activation function

[External link picture transfer failed, the source site may have an anti-theft link mechanism, it is recommended to save the picture and upload it directly (img-1gefuR8T-1648692034025) (watermark, type_ZmFuZ3poZW5naGVpdGk, shadow_10, text_aHR0cHM6Ly9ibG9nLmNzZG4ubmV0L3FxXzMxMjc4OTA z,size_16,color_FFFFFF,t_70.png)]

Aiming at the problem of slow training convergence caused by sigmoid sigmoid gradient saturation, ReLU was introduced in AlexNet. ReLU is a piecewise linear function, the output is 0 if it is less than or equal to 0; the output is identical if it is greater than 0. Compared with sigmoidsigmoid, ReLU has the following advantages:

  • Computational overhead is small. The forward propagation of sigmoidsigmoid has exponential operation and reciprocal operation, and ReLu is a linear output; in backpropagation, sigmoidsigmoid has exponential operation, and ReLU has an output part, and the derivative is always 1.
  • gradient saturation problem
  • sparsity. Relu will make the output of some neurons 0, which causes the sparsity of the network, reduces the interdependence of parameters, and alleviates the occurrence of overfitting problems.

2. Use Dropout

Overfitting : The root cause is that there are too many feature dimensions, too complex model assumptions, too many parameters, too little training data, and too much noise, which leads to the fitting function perfectly predicting the training set, but predicting the result of the test set of new data Difference. Overfitting the training data without taking generalization into account.

The introduction of Dropout is mainly to prevent overfitting. In the neural network, Dropout is realized by modifying the structure of the neural network itself. For a neuron of a certain layer, the neuron is set to 0 by a defined probability, and this neuron does not participate in forward and backward propagation, just like in the network is deleted, while keeping the number of neurons in the input layer and output layer unchanged, and then update the parameters according to the learning method of the neural network. In the next iteration, some neurons are randomly deleted again (set to 0) until the end of training.
Dropout should be regarded as a great innovation in AlexNet, and it is now one of the necessary structures in neural networks. Dropout can also be regarded as a model combination. The network structure generated each time is different. By combining multiple models, overfitting can be effectively reduced. Dropout only needs twice the training time to achieve model combination ( Similar to the effect of averaging), very efficient.

[External link picture transfer failed, the source site may have an anti-leeching mechanism, it is recommended to save the picture and upload it directly (img-UMVwFUoq-1648692034029)(image-20220330145132257.png)]

3. The calculation formula of the matrix size after convolution is:

N = (W − F + 2P ) / S + 1

① Input image size W×W

② Filter size F×F

③ Step size S

④ Number of pixels of padding P

4. Detailed explanation of each layer of network

[External link picture transfer failed, the source site may have an anti-leeching mechanism, it is recommended to save the picture and upload it directly (img-UuJwtrKl-1648692034031)(image-20220330150550517.png)]

Conv1: Maxpool1: Conv2: Maxpool2: Conv3: Conv4:
kernels:96 kernels:256 kernels:=384 kernels:384
kernel_size:11 kernel_size:3 kernel_size:5 kernel_size:3 kernel_size:3 kernel_size:3
padding: [1, 2] page: 0 padding: [2, 2] page: 0 padding: [1, 1] padding: [1, 1]
stride:4 stride:2 stride:1 stride:2 stride:1 stride:1
output_size: output_size: output_size: output_size: output_size: output_size:
[55, 55, 96] [27, 27, 96] [27, 27, 256] [13, 13, 256] [13, 13, 384] [13, 13, 384]
layer_name kernel_size kernel_num padding stride
Conv1 11 96 [1, 2] 4
Maxpool1 3 None 0 2
Conv2 5 256 [2, 2] 1
Maxpool2 3 None 0 2
Conv3 3 384 [1, 1] 1
Conv4 3 384 [1, 1] 1
Conv5 3 256 [1, 1] 1
Maxpool3 3 None 0 2
FC1 2048 None None None
FC2 2048 None None None
FC3 1000 None None None

2. Training and testing

First of all, this model uses a 5-category flower data set, if you want, you can private message me

[External link picture transfer failed, the source site may have an anti-leeching mechanism, it is recommended to save the picture and upload it directly (img-pdoKbcWu-1648692034033)(image-20220330151020207.png)]

1. Write the model code net.py

import torch
from torch import nn
import torch.nn.functional as F

class MyAlexNet(nn.Module):
    def __init__(self):
        super(MyAlexNet, self).__init__()
        self.c1 = nn.Conv2d(in_channels=3, out_channels=48, kernel_size=11, stride=4, padding=2)
        self.ReLU = nn.ReLU()
        self.c2 = nn.Conv2d(in_channels=48, out_channels=128, kernel_size=5, stride=1, padding=2)
        self.s2 = nn.MaxPool2d(2)
        self.c3 = nn.Conv2d(in_channels=128, out_channels=192, kernel_size=3, stride=1, padding=1)
        self.s3 = nn.MaxPool2d(2)
        self.c4 = nn.Conv2d(in_channels=192, out_channels=192, kernel_size=3, stride=1, padding=1)
        self.c5 = nn.Conv2d(in_channels=192, out_channels=128, kernel_size=3, stride=1, padding=1)
        self.s5 = nn.MaxPool2d(kernel_size=3, stride=2)
        self.flatten = nn.Flatten()
        self.f6 = nn.Linear(4608, 2048)
        self.f7 = nn.Linear(2048, 2048)
        self.f8 = nn.Linear(2048, 1000)
        self.f9 = nn.Linear(1000, 5)

    def forward(self, x):
        x = self.ReLU(self.c1(x))
        x = self.ReLU(self.c2(x))
        x = self.s2(x)
        x = self.ReLU(self.c3(x))
        x = self.s3(x)
        x = self.ReLU(self.c4(x))
        x = self.ReLU(self.c5(x))
        x = self.s5(x)
        x = self.flatten(x)
        x = self.f6(x)
        x = F.dropout(x, p=0.5)
        x = self.f7(x)
        x = F.dropout(x, p=0.5)
        x = self.f8(x)
        x = F.dropout(x, p=0.5)

        x = self.f9(x)
        return x

if __name__ == '__mian__':
    x = torch.rand([1, 3, 224, 224])
    model = MyAlexNet()
    y = model(x)

2. Write the model train.py

import os
import json

import torch
from PIL import Image
from torchvision import transforms

from model_v3 import mobilenet_v3_large


def main():
    device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")

    data_transform = transforms.Compose(
        [transforms.Resize(256),
         transforms.CenterCrop(224),
         transforms.ToTensor(),
         transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])])

    # load image
    # 指向需要遍历预测的图像文件夹
    imgs_root = r"D:/other/ClassicalModel/data/flower_datas/train/tulips"
    assert os.path.exists(imgs_root), f"file: '{
      
      imgs_root}' dose not exist."
    # 读取指定文件夹下所有jpg图像路径
    img_path_list = [os.path.join(imgs_root, i) for i in os.listdir(imgs_root) if i.endswith(".jpg")]

    # read class_indict
    json_path = r"D:/other/ClassicalModel/ResNet/class_indices.json"
    assert os.path.exists(json_path), f"file: '{
      
      json_path}' dose not exist."

    json_file = open(json_path, "r")
    class_indict = json.load(json_file)

    # create model
    model = mobilenet_v3_large(num_classes=5).to(device)

    # load model weights
    weights_path = r"D:/other/ClassicalModel/MobileNet/runs12/mobilenet_v3_large.pth"
    assert os.path.exists(weights_path), f"file: '{
      
      weights_path}' dose not exist."
    model.load_state_dict(torch.load(weights_path, map_location=device))

    #save predicted img
    filename = 'record.txt'
    save_path = 'detect'
    path_num = 1
    while os.path.exists(save_path + f'{
      
      path_num}'):
        path_num += 1
    os.mkdir(save_path + f'{
      
      path_num}')
    f = open(save_path + f'{
      
      path_num}/' + filename, 'w')
    f.write("imgs_root:"+imgs_root+"\n")
    f.write("weights_path:"+weights_path+"\n")

    actual_classes="tulips"
    acc_num=0
    all_num=len(img_path_list)
    # prediction
    model.eval()
    batch_size = 8  # 每次预测时将多少张图片打包成一个batch
    with torch.no_grad():
        for ids in range(0, len(img_path_list) // batch_size):
            img_list = []
            for img_path in img_path_list[ids * batch_size: (ids + 1) * batch_size]:
                assert os.path.exists(img_path), f"file: '{
      
      img_path}' dose not exist."
                img = Image.open(img_path)
                img = data_transform(img)
                img_list.append(img)

            # batch img
            # 将img_list列表中的所有图像打包成一个batch
            batch_img = torch.stack(img_list, dim=0)
            # predict class
            output = model(batch_img.to(device)).cpu()
            predict = torch.softmax(output, dim=1)
            probs, classes = torch.max(predict, dim=1)

            for idx, (pro, cla) in enumerate(zip(probs, classes)):
                print("image: {}  class: {}  prob: {:.3}".format(img_path_list[ids * batch_size + idx],
                                                                 class_indict[str(cla.numpy())],
                                                                 pro.numpy()))
                f.write("image: {}  class: {}  prob: {:.3}\n".format(img_path_list[ids * batch_size + idx],
                                                                 class_indict[str(cla.numpy())],
                                                                 pro.numpy()))
                if class_indict[str(cla.numpy())]==actual_classes:
                    acc_num+=1
    print("classes:{},acc_num:{:d},all_num:{:d},accuracy: {:.3f}".format(actual_classes,acc_num,all_num,acc_num/all_num))
    f.write("classes:{},acc_num:{:d},all_num:{:d},accuracy: {:.3f}".format(actual_classes,acc_num,all_num,acc_num/all_num))
    f.close()
if __name__ == '__main__':
    main()

Guess you like

Origin blog.csdn.net/qq_42076902/article/details/123864381