1. Detailed explanation of AlexNet network
AlexNet is the champion network of the ISLVRC 2012 (ImageNet Large Scale Visual Recognition Challenge) competition in 2012, and its classification accuracy has increased from the traditional 70%+ to 80%+. It was designed by Hinton and his student Alex Krizhevsky. It was also after that year that deep learning began to develop rapidly.
The highlights of the network are:
(1) For the first time, GPU is used for network acceleration training.
(2) The ReLU activation function is used instead of the traditional Sigmoid activation function and Tanh activation function.
(3) LRN local response normalization is used.
(4) Dropout random deactivation neuron operation is used in the first two layers of the fully connected layer to reduce overfitting.
1. ReLU activation function
Aiming at the problem of slow training convergence caused by sigmoid sigmoid gradient saturation, ReLU was introduced in AlexNet. ReLU is a piecewise linear function, the output is 0 if it is less than or equal to 0; the output is identical if it is greater than 0. Compared with sigmoidsigmoid, ReLU has the following advantages:
- Computational overhead is small. The forward propagation of sigmoidsigmoid has exponential operation and reciprocal operation, and ReLu is a linear output; in backpropagation, sigmoidsigmoid has exponential operation, and ReLU has an output part, and the derivative is always 1.
- gradient saturation problem
- sparsity. Relu will make the output of some neurons 0, which causes the sparsity of the network, reduces the interdependence of parameters, and alleviates the occurrence of overfitting problems.
2. Use Dropout
Overfitting : The root cause is that there are too many feature dimensions, too complex model assumptions, too many parameters, too little training data, and too much noise, which leads to the fitting function perfectly predicting the training set, but predicting the result of the test set of new data Difference. Overfitting the training data without taking generalization into account.
The introduction of Dropout is mainly to prevent overfitting. In the neural network, Dropout is realized by modifying the structure of the neural network itself. For a neuron of a certain layer, the neuron is set to 0 by a defined probability, and this neuron does not participate in forward and backward propagation, just like in the network is deleted, while keeping the number of neurons in the input layer and output layer unchanged, and then update the parameters according to the learning method of the neural network. In the next iteration, some neurons are randomly deleted again (set to 0) until the end of training.
Dropout should be regarded as a great innovation in AlexNet, and it is now one of the necessary structures in neural networks. Dropout can also be regarded as a model combination. The network structure generated each time is different. By combining multiple models, overfitting can be effectively reduced. Dropout only needs twice the training time to achieve model combination ( Similar to the effect of averaging), very efficient.
3. The calculation formula of the matrix size after convolution is:
N = (W − F + 2P ) / S + 1
① Input image size W×W
② Filter size F×F
③ Step size S
④ Number of pixels of padding P
4. Detailed explanation of each layer of network
Conv1: | Maxpool1: | Conv2: | Maxpool2: | Conv3: | Conv4: |
---|---|---|---|---|---|
kernels:96 | kernels:256 | kernels:=384 | kernels:384 | ||
kernel_size:11 | kernel_size:3 | kernel_size:5 | kernel_size:3 | kernel_size:3 | kernel_size:3 |
padding: [1, 2] | page: 0 | padding: [2, 2] | page: 0 | padding: [1, 1] | padding: [1, 1] |
stride:4 | stride:2 | stride:1 | stride:2 | stride:1 | stride:1 |
output_size: | output_size: | output_size: | output_size: | output_size: | output_size: |
[55, 55, 96] | [27, 27, 96] | [27, 27, 256] | [13, 13, 256] | [13, 13, 384] | [13, 13, 384] |
layer_name | kernel_size | kernel_num | padding | stride |
---|---|---|---|---|
Conv1 | 11 | 96 | [1, 2] | 4 |
Maxpool1 | 3 | None | 0 | 2 |
Conv2 | 5 | 256 | [2, 2] | 1 |
Maxpool2 | 3 | None | 0 | 2 |
Conv3 | 3 | 384 | [1, 1] | 1 |
Conv4 | 3 | 384 | [1, 1] | 1 |
Conv5 | 3 | 256 | [1, 1] | 1 |
Maxpool3 | 3 | None | 0 | 2 |
FC1 | 2048 | None | None | None |
FC2 | 2048 | None | None | None |
FC3 | 1000 | None | None | None |
2. Training and testing
First of all, this model uses a 5-category flower data set, if you want, you can private message me
1. Write the model code net.py
import torch
from torch import nn
import torch.nn.functional as F
class MyAlexNet(nn.Module):
def __init__(self):
super(MyAlexNet, self).__init__()
self.c1 = nn.Conv2d(in_channels=3, out_channels=48, kernel_size=11, stride=4, padding=2)
self.ReLU = nn.ReLU()
self.c2 = nn.Conv2d(in_channels=48, out_channels=128, kernel_size=5, stride=1, padding=2)
self.s2 = nn.MaxPool2d(2)
self.c3 = nn.Conv2d(in_channels=128, out_channels=192, kernel_size=3, stride=1, padding=1)
self.s3 = nn.MaxPool2d(2)
self.c4 = nn.Conv2d(in_channels=192, out_channels=192, kernel_size=3, stride=1, padding=1)
self.c5 = nn.Conv2d(in_channels=192, out_channels=128, kernel_size=3, stride=1, padding=1)
self.s5 = nn.MaxPool2d(kernel_size=3, stride=2)
self.flatten = nn.Flatten()
self.f6 = nn.Linear(4608, 2048)
self.f7 = nn.Linear(2048, 2048)
self.f8 = nn.Linear(2048, 1000)
self.f9 = nn.Linear(1000, 5)
def forward(self, x):
x = self.ReLU(self.c1(x))
x = self.ReLU(self.c2(x))
x = self.s2(x)
x = self.ReLU(self.c3(x))
x = self.s3(x)
x = self.ReLU(self.c4(x))
x = self.ReLU(self.c5(x))
x = self.s5(x)
x = self.flatten(x)
x = self.f6(x)
x = F.dropout(x, p=0.5)
x = self.f7(x)
x = F.dropout(x, p=0.5)
x = self.f8(x)
x = F.dropout(x, p=0.5)
x = self.f9(x)
return x
if __name__ == '__mian__':
x = torch.rand([1, 3, 224, 224])
model = MyAlexNet()
y = model(x)
2. Write the model train.py
import os
import json
import torch
from PIL import Image
from torchvision import transforms
from model_v3 import mobilenet_v3_large
def main():
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
data_transform = transforms.Compose(
[transforms.Resize(256),
transforms.CenterCrop(224),
transforms.ToTensor(),
transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])])
# load image
# 指向需要遍历预测的图像文件夹
imgs_root = r"D:/other/ClassicalModel/data/flower_datas/train/tulips"
assert os.path.exists(imgs_root), f"file: '{
imgs_root}' dose not exist."
# 读取指定文件夹下所有jpg图像路径
img_path_list = [os.path.join(imgs_root, i) for i in os.listdir(imgs_root) if i.endswith(".jpg")]
# read class_indict
json_path = r"D:/other/ClassicalModel/ResNet/class_indices.json"
assert os.path.exists(json_path), f"file: '{
json_path}' dose not exist."
json_file = open(json_path, "r")
class_indict = json.load(json_file)
# create model
model = mobilenet_v3_large(num_classes=5).to(device)
# load model weights
weights_path = r"D:/other/ClassicalModel/MobileNet/runs12/mobilenet_v3_large.pth"
assert os.path.exists(weights_path), f"file: '{
weights_path}' dose not exist."
model.load_state_dict(torch.load(weights_path, map_location=device))
#save predicted img
filename = 'record.txt'
save_path = 'detect'
path_num = 1
while os.path.exists(save_path + f'{
path_num}'):
path_num += 1
os.mkdir(save_path + f'{
path_num}')
f = open(save_path + f'{
path_num}/' + filename, 'w')
f.write("imgs_root:"+imgs_root+"\n")
f.write("weights_path:"+weights_path+"\n")
actual_classes="tulips"
acc_num=0
all_num=len(img_path_list)
# prediction
model.eval()
batch_size = 8 # 每次预测时将多少张图片打包成一个batch
with torch.no_grad():
for ids in range(0, len(img_path_list) // batch_size):
img_list = []
for img_path in img_path_list[ids * batch_size: (ids + 1) * batch_size]:
assert os.path.exists(img_path), f"file: '{
img_path}' dose not exist."
img = Image.open(img_path)
img = data_transform(img)
img_list.append(img)
# batch img
# 将img_list列表中的所有图像打包成一个batch
batch_img = torch.stack(img_list, dim=0)
# predict class
output = model(batch_img.to(device)).cpu()
predict = torch.softmax(output, dim=1)
probs, classes = torch.max(predict, dim=1)
for idx, (pro, cla) in enumerate(zip(probs, classes)):
print("image: {} class: {} prob: {:.3}".format(img_path_list[ids * batch_size + idx],
class_indict[str(cla.numpy())],
pro.numpy()))
f.write("image: {} class: {} prob: {:.3}\n".format(img_path_list[ids * batch_size + idx],
class_indict[str(cla.numpy())],
pro.numpy()))
if class_indict[str(cla.numpy())]==actual_classes:
acc_num+=1
print("classes:{},acc_num:{:d},all_num:{:d},accuracy: {:.3f}".format(actual_classes,acc_num,all_num,acc_num/all_num))
f.write("classes:{},acc_num:{:d},all_num:{:d},accuracy: {:.3f}".format(actual_classes,acc_num,all_num,acc_num/all_num))
f.close()
if __name__ == '__main__':
main()