Deep learning - convolutional neural network (CNN) full notes, with code

 The content of this article is the study notes of the station [Convolutional Neural Network-CNN] Deep Learning (Tang Yudi takes you to learn AI): Detailed Explanation of Convolutional Neural Network Theory and Project Combat, Computer Vision, Image Recognition Module Combat_哔哩哔哩_bilibili


Table of contents

Deep Learning Basics

What is deep learning?

machine learning process 

The role of feature engineering

How to extract features

Why deep learning is needed 

 Applications of Deep Learning

Deep Learning Disadvantages

 Traditional Algorithms and Deep LearningEdit

computer vision

Computer Vision Challenges 

Machine learning routines

K-Nearest Neighbors 

K nearest neighbor calculation process

K nearest neighbor analysis

Database sample: CIFAR-10 

Why K-nearest neighbors cannot be used for image classification

Neural Network Basics

linear function 

How does the weight value in W come from? 

loss function 

Softmax classifier 

 forward propagation

 convolutional neural network

What can convolutional neural networks do? 

 The difference between convolutional network and traditional network

Overall structure 

convolutional layer  

 What does convolution do?

Is it enough to do only one convolution?

The convolutional layer involves parameters

pooling layer 

max pooling

 The specific approach of the overall structure

Feature map changes 

Classic Network—Alexnet 

Classic Network—Vgg 

Classic Network—Resnet 

receptive field 

Project Combat - Building a Recognition Model Based on CNN 1

First read the data

Convolutional network module construction

Accuracy as an evaluation criterion  

Train the network model  

Project Combat - Constructing a Recognition Model Based on CNN II 

Interpretation of commonly used modules in image recognition

Data preprocessing part

Network module settings

Network model preservation and testing

Data reading and preprocessing operations 

 Make a good data source

>> Data Augmentation

 Read the actual name corresponding to the tag

show the data

 >> Transfer Learning

Load the model provided in models, and directly use the trained weights as initialization parameters

Set which layers need to be trained

optimizer settings

 training module 


Deep Learning Basics

What is deep learning?

Deep learning is a part of machine learning, and the effect is better.

Neural network (CNN) should not be called an algorithm, but a method of feature extraction.

machine learning process 

Data acquisition—feature engineering—model building—evaluation and application 

The most important and core part of  feature engineering .

Deep learning solves some artificial problems in machine learning to a certain extent. You can judge the extracted features by yourself and choose the most suitable method to process. Machine learning requires manual extraction of features

The role of feature engineering

  • Data characteristics determine the upper bound of the model.
  • Preprocessing and thermal certificate extraction are the core .
  • Algorithm and parameter choice determine how to approach this upper limit. 

How to extract features

Why deep learning is needed 

 Deep learning is really able to learn what kind of features are most suitable.

The core  of the solution is how to extract features

 Applications of Deep Learning

 can do something in computer vision and natural language processing

 face recognition

 medical applications

 face swapping

 video recovery

 Wide range of applications

Deep Learning Disadvantages

The amount of deep learning calculation is very large, and the speed may be too slow.

 Traditional Algorithms and Deep Learning

computer vision

 

Computer Vision Challenges 

 

 

Machine learning routines

  • Collect data and assign labels
  • train a classifier
  • test, evaluate

 

K-Nearest Neighbors 

 K=3, triangle; K=5, square.

 

K nearest neighbor calculation process

  1. Calculate the distance of a point in a dataset of known classes from the current point
  2. Sort by distance
  3. Select K points with the smallest distance from the current point
  4. Determine the probability of occurrence of the category of the first K points
  5. Return the category with the highest occurrence frequency of the first K points as the current point prediction category 

K nearest neighbor analysis

  •  The KNN algorithm itself is simple and effective, and it is a lazy-learning algorithm.
  • The classifier does not need to use the training set for training, and the training time complexity is 0.
  • The computational complexity of KNN classification is proportional to the number of documents in the training set, that is, if the total number of documents in the training set is n, then the time complexity of KNN classification is O(n).
  • The choice of K value, the distance measure and the classification decision rule are the three basic elements of the algorithm

 

Database sample: CIFAR-10 

 K nearest neighbors for image classification

 ​​​​​

 The problem is that there is something in the image that is not noticed, that does not tell the subject and the background

Why K-nearest neighbors cannot be used for image classification

  • Context dominance is the biggest problem, but what we focus on is the subject (principal component)

How can the machine learn which are the important ingredients?


Neural Network Basics

linear function 

 The pixels of the eyes, ears, background, etc. have different effects on the result. Some pixels promote it to be a cat, while others inhibit it. Therefore, each pixel corresponds to a different degree of importance, and W is called a weight parameter.

 

 3072 pixels, each pixel has its own weight in the current category, 3072 pixels have 3072 weights, for example, the pixel has the weight of cats in the category of cats, and the weight of dogs in the category of dogs, just The weight values ​​are different. If it is divided into 10 categories, there are 10*3072 weight parameters, that is, W is a matrix of 10*3072; x represents 3072 pixels, which is a matrix of 3072*1.

W*X is a 10*1 matrix, which is the result of the score of each category.

b is the bias parameter, which is a 10*1 matrix, which plays a fine-tuning role, and each category has its own fine-tuning value.

 

  • Suppose the cat is composed of four pixels, x is a 4*1 matrix with three categories: cat, dog, boat; W is a 3*4 matrix, 3 represents three categories, and 4 represents 4 pixels in each category The weight parameter of b; b is the bias parameter of 3*1.
  • A larger value in W means that in this category, the influence of the pixel is relatively large, for example, 2.1 means that in the dog category, the influence of a pixel value of 24 is 2.1, and 0 means that it is not very important.
  • ± in W represents: positive value represents promoting effect, negative value represents inhibitory effect

How does the weight value in W come from? 

The weight value is optimized. The initial weight value can be set to a random number. For example, in the above picture, it is clearly a cat, but the final result is a dog. What is the reason?

The data is unchanged. When the image is input, the value of x is unchanged, but the value of W can be changed.

What the neural network does is to optimize W, so that W can be more suitable for the data to do the current task.

 W can be set to a random value at the beginning. Next, in the iterative process, think of an optimization method to continuously improve the W parameter.

 The weight parameter controls the trend of the decision boundary, and the bias parameter is just a fine-tuning.

 

loss function 

 

 s_{j}: other error categories

s_{y_{i}}: correct class

1: It is equivalent to a tolerance level, 0 means no loss, when the correct and wrong scores are similar, it may not be possible to distinguish, at this time +1 can better distinguish the wrong and correct results.

 

 A is different from the B model. A will produce overfitting, which is undesirable. Therefore, in the construction of the loss function, a regularization penalty term needs to be added.

 R(W): Only weight parameters are considered, and for 10 categories, R(W)=W1 square+W2 square+...+...+W10 square

\lambda: Penalty coefficient, the larger the value, the more you don’t want overfitting, and the regularization penalty is larger;

Softmax classifier 

 

 

exp: e^{x} Make a mapping, e^{3.2}=24.5, 5.1 becomes 164 after mapping, -1.7 becomes 0.18

Normalization: score for this item/add all up

Calculate the loss value: the input is the probability value of the correct category; the correct image is originally a cat, but the probability value of the hair is 0.13, so the loss value L=-log(0.13)=0.89

 forward propagation

 The step from W, x to the final loss value L is called forward propagation. To update the model and optimize W, gradient descent is required

 

 convolutional neural network

What can convolutional neural networks do? 

 

 Retrieval: Input an image and return similar results.

 

 

 

 

 The difference between convolutional network and traditional network

 The left is the traditional network (NN), the right is the convolutional network (CNN), the input on the left is the pixel, and the input on the right is the image

For example, input 784 on the left, which is 784 pixels, and the image on the right is 28*28*1, which is three-dimensional

The convolutional network does not first pull the data into a vector, but directly extracts features from the image data. is h*w*c 

Overall structure 

 

 The input layer is the input image, three-dimensional, convolution is to extract features, and pooling is to compress features

convolutional layer  

 What does convolution do?

 

The characteristics of different regions of image data are different, and the importance is also different.

First divide the image into different regions, and then extract different features from each region. Like the neural network, a set of weight parameters is also used to obtain the feature value

The 3*3 in the blue image can be regarded as a divided small area in the image data (every 3*3 division), and the subscript number is the weight parameter matrix corresponding to this area. Finally, a value of 12 is obtained, which is the representative value of the current segmented area.

The green image represents the feature map obtained after performing a convolution

 RGB

32*32*3, 3 represents three color channels

When actually doing calculations, each color channel needs to be calculated again, and finally the results of each color channel convolution are added up

 

 For example, for three color channels, R/G/B are calculated once on the corresponding segmented area. Assuming that the corresponding result of the R channel is 1, the corresponding result of the B channel is 2, and the corresponding result of the G channel is 3, the final result is 1+2 +3=6

 Different weight parameters will get different feature maps, there can be many

 

 

Suppose the input data is 7*7*3

Filter W0 : means to randomize a set of weight parameters, the first two 3*3 are convolution kernels, indicating that the selected area is the size of 3*3, that is, a feature is selected for every 3*3 area; the values ​​​​of the three channels It is not the same, because the pixel values ​​​​are different in different channels.

Calculation : Use the inner product for calculation, that is, multiply the corresponding positions, and add all the results together. For example: the calculated inner product results of the three color channels in the above figure are 0, 2, and 0 respectively. The final result is that the three color channels are added together to be 0+2+0+b, and b is the bias item. In this figure, b= 1, so the final result is 3, that is, the result corresponding to the first 3*3 area is 3

Filter W1: W1 has the same specifications as W0, but different values

The green image is 3*3*2, 2 represents the depth, that is, the number of feature maps

 After the first area is finished, it's time for the next area, and it will be moved back two squares (you can set the size yourself), and so on

Is it enough to do only one convolution?

 Doing it once is not enough, it needs many times

Input: 32*32*3

eg6 represents 6 different filters, and the generated feature map has 6 layers

The convolution kernel depth must be consistent with the depth of the previous input data

The convolutional layer involves parameters

​​​​​​​​Convolution kernel step size

The smaller the step size, the more delicate the image features are extracted, and the slower the efficiency. Generally 1 

For each step size +1, the feature map h and w will be reduced by 2

Convolution kernel size

The size of the 3*3 and 4*4 regions is different. The smaller the size of the convolution kernel, the more delicate the image features will be extracted. Generally 3*3

edge padding

 The closer the image data is to the edge, the smaller the impact on feature extraction, the closer to the middle, the more calculation times, the greater the impact on the feature extraction results. Boundary filling can put the boundary of the original image inside, can increase the degree of feature extraction, and avoid information loss to a certain extent.

 Why choose to add 0 instead of other values?

When adding other values, when calculating with filter, some values ​​will be generated, which will affect the result

Number of convolution kernels

How many feature maps need to be obtained in the end, and the number of convolution kernels is the number 

Convolution kernel result calculation formula

 If the input data is a 32*32*3 image, use 10 filters of 5*5*3 for convolution operation, specify the step size as 1, border padding as 2, and the final output size?

(32-5+2*2)/1+1=32, the final output size is 32*32*10

After the convolution operation, the length and width of the feature map can also be kept unchanged.

Convolution parameter sharing 

 

 

pooling layer 

 The pooling layer is for compression and does not involve any matrix calculations, it is just a filtering operation

 h, w changed, but c unchanged

max pooling

 Do MAX POOLING for a certain feature map (4*4), it will first select different areas, and select the largest value in each area

Why is the largest value chosen?

 In convolutional neural networks, the larger values ​​​​obtained are often more important

 The specific approach of the overall structure

 

 First, convolution is used for feature extraction. The RELU after convolution needs to be added with a nonlinear transformation after one convolution, and one pooling is performed after two convolutions.

Assuming that the final feature map is 32*32*10, how to classify and convert it into a classification probability value?

The fully connected layer FC is required. FC integrates the highly abstracted features after multiple convolutions before, and then can be normalized to output a probability for various classification situations. The subsequent classifiers can be obtained according to the full connection. probabilistic classification.

The matrix size of FC: [10240, 5]; five categories, so the second value is 5, FC cannot connect three-dimensional, so the previous three-dimensional feature map needs to be pulled into a feature vector, and the vector size is 32*32*10=10240 .

What can be called a layer? How many layers of neural network does this graph have?

The ability to calculate parameters with parameters is called a layer, with the convolutional layer, without the RELU layer, without the pooling layer, and with the fully connected layer; so the figure has a seven-layer neural network.

Feature map changes 

 Transformation: pull the three-dimensional feature map into a one-dimensional vector

Classic Network—Alexnet 

 

  • 11*11 filters are drastic, not good, the smaller the convolution kernel at this stage, the better
  • stride is 4, the step size is too long; pad is 0, no padding
  • 8-layer network, 5-layer convolution, 3-layer full connection

Classic Network—Vgg 

 

  • All convolution sizes are 3*3, delicate
  • There are 16/19 layers of network
  • After maxpool, information will be lost, and Vgg will double the feature map after maxpool to make up for the lost information
  • The classification accuracy of Vgg is about 15% higher than that of Alexnet, but the training time of the network needs to be much longer than that of Alexnet

Why does Vgg use 16 layers? Is the higher the layer the better?

 After verification, it is found that the effect of the 16th layer is better than that of the other layers, because as the convolutional layer increases, not all the convolutional layers do have a good effect, because the features are extracted on the basis of the previously extracted features, Not necessarily better than previously extracted features

Classic Network—Resnet 

 

 Question: The error rate of 56 layers is higher than that of 20 layers

Analysis: There must be some layers between the 20th floor and the 56th floor that are not well done

Solution: x: A certain layer in the convolution, if you perform two convolutions, you may find that the effect may not be good. At this time, use an additional line to bring x over, and stack and add the results of the convolution, pay attention to F(x) The shape must be the same as that of x. The so-called addition is the addition of numbers in the same position of the feature matrix; if the convolution layer always increases the lose value, then the network will make all the parameters of this layer 0, and the remaining Lower x level. It is equivalent to this layer of white play, but at least it will not be worse than the original result.

Solution effect: 

 The left side is the traditional network, and the right side is the Resnet network; the higher the number of layers on the left, the larger the error value, and the higher the number of layers on the right, the smaller the error value

 Resnet is a feature extraction, because whether a network is classification or regression depends on the loss function and how the final layer is connected

 

receptive field 

The area size of the pixel points on the feature map output by each layer of the convolutional neural network mapped on the input image.

 

 

Project Combat - Building a Recognition Model Based on CNN 1

Build Convolutional Neural Networks 

 The input and layer in the convolutional network are somewhat different from the traditional neural network and need to be redesigned, and the training modules are basically the same

import torch
import torch.nn as nn
import torch.optim as optim
import torch.nn.functional as F
from torchvision import datasets,transforms 
import matplotlib.pyplot as plt
import numpy as np
%matplotlib inline

 

First read the data

  • Build the training set and test set (validation set) separately
  • DataLoader to iteratively fetch data
# 定义超参数 
input_size = 28  #图像的总尺寸28*28*1,三维的
num_classes = 10  #标签的种类数
num_epochs = 3  #训练的总循环周期
batch_size = 64  #一个撮(批次)的大小,64张图片
 
# 训练集
train_dataset = datasets.MNIST(root='./data',  
                            train=True,   
                            transform=transforms.ToTensor(),  
                            download=True) 
 
# 测试集
test_dataset = datasets.MNIST(root='./data', 
                           train=False, 
                           transform=transforms.ToTensor())
 
# 构建batch数据
train_loader = torch.utils.data.DataLoader(dataset=train_dataset, 
                                           batch_size=batch_size, 
                                           shuffle=True)
test_loader = torch.utils.data.DataLoader(dataset=test_dataset, 
                                           batch_size=batch_size, 
                                           shuffle=True)

 

Convolutional network module construction

  • General convolution layer, relu layer, pooling layer can be written as a package
  • Note that the final result of convolution is still a feature map, which needs to be converted into a vector to do classification or regression tasks
class CNN(nn.Module):
    def __init__(self):
        super(CNN, self).__init__()
        self.conv1 = nn.Sequential(         # 输入大小 (1, 28, 28) conv1不是第一个卷积层,而是第一个卷积模块,包括卷积relu池化
            nn.Conv2d(
                in_channels=1,              # 灰度图,当前输入的特征图个数
                out_channels=16,            # 要得到几多少个特征图
                kernel_size=5,              # 卷积核大小
                stride=1,                   # 步长
                padding=2,                  # 如果希望卷积后大小跟原来一样,需要设置padding=(kernel_size-1)/2 if stride=1
            ),                              # 输出的特征图为 (16, 28, 28)
            nn.ReLU(),                      # relu层
            nn.MaxPool2d(kernel_size=2),    # 进行池化操作(2x2 区域), 输出结果为: (16, 14, 14)
        )
        self.conv2 = nn.Sequential(         # 下一个套餐的输入 (16, 14, 14)
            nn.Conv2d(16, 32, 5, 1, 2),     #参数的简单写法,与conv1对应。# 输出 (32, 14, 14)
            nn.ReLU(),                      # relu层
            nn.MaxPool2d(2),                # 输出 (32, 7, 7)
        )
        self.out = nn.Linear(32 * 7 * 7, 10)   # 全连接层得到的结果
 
    def forward(self, x):   #前向传播
        x = self.conv1(x)
        x = self.conv2(x)
        x = x.view(x.size(0), -1)           # flatten操作,结果为:(batch_size, 32 * 7 * 7)
        output = self.out(x)
        return output

Accuracy as an evaluation criterion  

def accuracy(predictions, labels):
    pred = torch.max(predictions.data, 1)[1] 
    rights = pred.eq(labels.data.view_as(pred)).sum() 
    return rights, len(labels) 

Train the network model  

# 实例化
net = CNN() 
#损失函数
criterion = nn.CrossEntropyLoss() 
#优化器
optimizer = optim.Adam(net.parameters(), lr=0.001) #定义优化器,普通的随机梯度下降算法
 
#开始训练循环
for epoch in range(num_epochs):
    #当前epoch的结果保存下来
    train_rights = [] 
    
    for batch_idx, (data, target) in enumerate(train_loader):  #针对容器中的每一个批进行循环
        net.train()                             
        output = net(data) 
        loss = criterion(output, target) 
        optimizer.zero_grad() 
        loss.backward() 
        optimizer.step() 
        right = accuracy(output, target) 
        train_rights.append(right) 
 
    
        if batch_idx % 100 == 0: 
            
            net.eval() 
            val_rights = [] 
            
            for (data, target) in test_loader:
                output = net(data) 
                right = accuracy(output, target) 
                val_rights.append(right)
                
            #准确率计算
            train_r = (sum([tup[0] for tup in train_rights]), sum([tup[1] for tup in train_rights]))
            val_r = (sum([tup[0] for tup in val_rights]), sum([tup[1] for tup in val_rights]))
 
            print('当前epoch: {} [{}/{} ({:.0f}%)]\t损失: {:.6f}\t训练集准确率: {:.2f}%\t测试集正确率: {:.2f}%'.format(
                epoch, batch_idx * batch_size, len(train_loader.dataset),
                100. * batch_idx / len(train_loader), 
                loss.data, 
                100. * train_r[0].numpy() / train_r[1], 
                100. * val_r[0].numpy() / val_r[1]))

Project Combat - Constructing a Recognition Model Based on CNN II 

Interpretation of commonly used modules in image recognition

Data preprocessing​​​​​​​​

  • Data enhancement: the transforms module in torchvision has its own functions, which are more practical
  • Data preprocessing: transforms in torchvision also help us realize it, just call it directly
  • The DataLoader module directly reads batch data

Network module settings

  • Load the pre-trained model, there are many classic network architectures in torchvision, it is very convenient to call, and you can use the weight parameters trained by others to continue training, which is the so-called transfer learning
  • It should be noted that the tasks trained by others are not exactly the same as ours. We need to change the last head layer, which is generally the last fully connected layer, and change it to our own tasks.
  • During training, you can train all over again, or you can only train the last layer of our task, because the first few layers are all for feature extraction, and the essential task goals are the same

Network model preservation and testing

  • The model can be saved selectively, for example, if the current effect is good in the verification set, save it
  • Read the model for actual testing

 

import os
import matplotlib.pyplot as plt
%matplotlib inline
import numpy as np
import torch
from torch import nn
import torch.optim as optim
import torchvision
#需要先安装pip install torchvision,安装后就可以用里面的三大模块了
from torchvision import transforms, models, datasets
#https://pytorch.org/docs/stable/torchvision/index.html
import imageio
import time
import warnings
import random
import sys
import copy
import json
from PIL import Image

Data reading and preprocessing operations 

data_dir = './flower_data/'
train_dir = data_dir + '/train'
valid_dir = data_dir + '/valid'

 Make a good data source

  • All image preprocessing operations are specified in data_transforms
  • ImageFolder assumes that all files are stored in folders, and images of the same category are stored under each folder, and the name of the folder is the name of the category

>> Data Augmentation

 If the data is not enough, the original image can be flipped, rotated, enlarged, and reduced to get more data


data_transforms = {#训练集做数据增强
    'train': transforms.Compose([transforms.RandomRotation(45),#随机旋转,45是-45到45度之间随机选
        transforms.CenterCrop(224),#从中心开始裁剪(非随机),留下224*244的大小区域
        transforms.RandomHorizontalFlip(p=0.5),#随机水平翻转 p=0.5,有50%的概率进行翻转,50%概率不动,一般情况下都是0.5
        transforms.RandomVerticalFlip(p=0.5),#随机垂直翻转
        transforms.ColorJitter(brightness=0.2, contrast=0.1, saturation=0.1, hue=0.1),#参数1为亮度,参数2为对比度,参数3为饱和度,参数4为色相
        transforms.RandomGrayscale(p=0.025),#概率转换成灰度率,3通道就是R=G=B
        transforms.ToTensor(),
        transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])#标准化: 均值,标准差   (减均值/标准差)
    ]),
    #验证集不需要数据增强了
    'valid': transforms.Compose([transforms.Resize(256),  #做验证的时候需要resize
        transforms.CenterCrop(224),#中心裁剪
        transforms.ToTensor(),
        transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])#标准化
    ]),  #训练集是怎么做预处理的验证集也需要怎么做预处理
}
batch_size = 8#显存不够可调小
 
image_datasets = {x: datasets.ImageFolder(os.path.join(data_dir, x), data_transforms[x]) for x in ['train', 'valid']}  #datasets.ImageFolder(实际路径,刚才的预处理方法)
dataloaders = {x: torch.utils.data.DataLoader(image_datasets[x], batch_size=batch_size, shuffle=True) for x in ['train', 'valid']}
dataset_sizes = {x: len(image_datasets[x]) for x in ['train', 'valid']}
class_names = image_datasets['train'].classes

 Read the actual name corresponding to the tag

with open('cat_to_name.json', 'r') as f:
    cat_to_name = json.load(f)

show the data

  • Note that the tensor data needs to be converted into numpy format, and also needs to be restored to the normalized result

 

def im_convert(tensor):
    """ 展示数据"""
    
    image = tensor.to("cpu").clone().detach()
    image = image.numpy().squeeze()
    image = image.transpose(1,2,0)   #将H W C还原回去
    image = image * np.array((0.229, 0.224, 0.225)) + np.array((0.485, 0.456, 0.406))#还原标准化,先乘再加
    image = image.clip(0, 1)
 
    return image
fig=plt.figure(figsize=(20, 12))
columns = 4
rows = 2
 
dataiter = iter(dataloaders['valid'])   #一组batch数据
inputs, classes = dataiter.next()
 
for idx in range (columns*rows):
    ax = fig.add_subplot(rows, columns, idx+1, xticks=[], yticks=[])
    ax.set_title(cat_to_name[str(int(class_names[classes[idx]]))])
    plt.imshow(im_convert(inputs[idx]))
plt.show()

 >> Transfer Learning

In the process of training the network, various problems may be encountered

  • Maybe there is not much data in hand, which will lead to overfitting of the model, and the result may not be good-do data enhancement 
  • To train the network model, various parameters need to be adjusted, which takes a lot of time
  • It takes too much time to train a model 

Here you need to use transfer learning

 If your own data is not enough, you can use other people's models to use other people's trained weight parameters and bias parameters

But you need to ensure that all your structures, input and output formats are consistent with others

 

There are usually two options for the previous layer:

  • A: Take someone else's convolutional layer as your own weight parameter initialization, and then continue training
  • B: Take someone else's convolutional layer and freeze it, keep it unchanged, and use the weight parameter as your own result

If the amount of data is small, many layers need to be frozen; if the amount of data is medium, the previous layers can be frozen; if the amount of data is large, you can choose not to freeze; 

The fully connected layer is usually redefined and retrained in its own way 

 

Load the model provided in models, and directly use the trained weights as initialization parameters

model_name = 'resnet'  #可选的比较多 ['resnet', 'alexnet', 'vgg', 'squeezenet', 'densenet', 'inception']
#是否用人家训练好的特征来做
feature_extract = True 

# 是否用GPU训练
train_on_gpu = torch.cuda.is_available()
 
if not train_on_gpu:
    print('CUDA is not available.  Training on CPU ...')
else:
    print('CUDA is available!  Training on GPU ...')
    
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
#迁移学习
def set_parameter_requires_grad(model, feature_extracting): #要不要把某些层冻住,参数不用训练了
    if feature_extracting:
        for param in model.parameters():
            param.requires_grad = False
model_ft = models.resnet152()
model_ft

Refer to the pytorch official website example 

def initialize_model(model_name, num_classes, feature_extract, use_pretrained=True):
    # 选择合适的模型,不同模型的初始化方法稍微有点区别
    model_ft = None
    input_size = 0
 
    if model_name == "resnet":
        """ Resnet152
        """
        model_ft = models.resnet152(pretrained=use_pretrained) #pretrained 是否下载用人家的模型
        set_parameter_requires_grad(model_ft, feature_extract)
        num_ftrs = model_ft.fc.in_features #返回原来模型的全连接层2048
        model_ft.fc = nn.Sequential(nn.Linear(num_ftrs, 102),
                                   nn.LogSoftmax(dim=1))  #加一个全连接层2048*102
        input_size = 224
 
    elif model_name == "alexnet":
        """ Alexnet
        """
        model_ft = models.alexnet(pretrained=use_pretrained)
        set_parameter_requires_grad(model_ft, feature_extract)
        num_ftrs = model_ft.classifier[6].in_features
        model_ft.classifier[6] = nn.Linear(num_ftrs,num_classes)
        input_size = 224
 
    elif model_name == "vgg":
        """ VGG11_bn
        """
        model_ft = models.vgg16(pretrained=use_pretrained)
        set_parameter_requires_grad(model_ft, feature_extract)
        num_ftrs = model_ft.classifier[6].in_features
        model_ft.classifier[6] = nn.Linear(num_ftrs,num_classes)
        input_size = 224
 
    elif model_name == "squeezenet":
        """ Squeezenet
        """
        model_ft = models.squeezenet1_0(pretrained=use_pretrained)
        set_parameter_requires_grad(model_ft, feature_extract)
        model_ft.classifier[1] = nn.Conv2d(512, num_classes, kernel_size=(1,1), stride=(1,1))
        model_ft.num_classes = num_classes
        input_size = 224
 
    elif model_name == "densenet":
        """ Densenet
        """
        model_ft = models.densenet121(pretrained=use_pretrained)
        set_parameter_requires_grad(model_ft, feature_extract)
        num_ftrs = model_ft.classifier.in_features
        model_ft.classifier = nn.Linear(num_ftrs, num_classes)
        input_size = 224
 
    elif model_name == "inception":
        """ Inception v3
        Be careful, expects (299,299) sized images and has auxiliary output
        """
        model_ft = models.inception_v3(pretrained=use_pretrained)
        set_parameter_requires_grad(model_ft, feature_extract)
        # Handle the auxilary net
        num_ftrs = model_ft.AuxLogits.fc.in_features
        model_ft.AuxLogits.fc = nn.Linear(num_ftrs, num_classes)
        # Handle the primary net
        num_ftrs = model_ft.fc.in_features
        model_ft.fc = nn.Linear(num_ftrs,num_classes)
        input_size = 299
 
    else:
        print("Invalid model name, exiting...")
        exit()
 
    return model_ft, input_size

Set which layers need to be trained

model_ft, input_size = initialize_model(model_name, 102, feature_extract, use_pretrained=True)  #要不要冻住一些层,要不要用人家的model
 
#GPU计算
model_ft = model_ft.to(device)
 
# 模型保存
filename='checkpoint.pth'
 
# 是否训练所有层
params_to_update = model_ft.parameters()
print("Params to learn:")
if feature_extract:
    params_to_update = []
    for name,param in model_ft.named_parameters():
        if param.requires_grad == True:
            params_to_update.append(param)
            print("\t",name)
else:
    for name,param in model_ft.named_parameters():
        if param.requires_grad == True:
            print("\t",name)
model_ft#改完后,打印网络架构,看最后

optimizer settings

# 优化器设置
optimizer_ft = optim.Adam(params_to_update, lr=1e-2)
scheduler = optim.lr_scheduler.StepLR(optimizer_ft, step_size=7, gamma=0.1)#学习率每7个epoch衰减成原来的1/10
#最后一层已经LogSoftmax()了,所以不能nn.CrossEntropyLoss()来计算了,nn.CrossEntropyLoss()相当于logSoftmax()和nn.NLLLoss()整合
criterion = nn.NLLLoss()

 training module 

def train_model(model, dataloaders, criterion, optimizer, num_epochs=25, is_inception=False,filename=filename): #模型、一个一个batch取数据、损失函数、优化器、训练多少epoch、要不要用其他的网络、
    since = time.time()
    best_acc = 0  #保存一个最好的准确率
    """
    checkpoint = torch.load(filename)
    best_acc = checkpoint['best_acc']
    model.load_state_dict(checkpoint['state_dict'])
    optimizer.load_state_dict(checkpoint['optimizer'])
    model.class_to_idx = checkpoint['mapping']
    """
    model.to(device)
 
    val_acc_history = []
    train_acc_history = []
    train_losses = []
    valid_losses = []
    LRs = [optimizer.param_groups[0]['lr']]  #学习率
 
    best_model_wts = copy.deepcopy(model.state_dict()) #最好的一次存下来
 
    for epoch in range(num_epochs):
        print('Epoch {}/{}'.format(epoch, num_epochs - 1))
        print('-' * 10)
 
        # 训练和验证
        for phase in ['train', 'valid']:
            if phase == 'train':
                model.train()  # 训练
            else:
                model.eval()   # 验证
 
            running_loss = 0.0
            running_corrects = 0
 
            # 把数据都取个遍
            for inputs, labels in dataloaders[phase]:
                inputs = inputs.to(device)  #传到GPU当中
                labels = labels.to(device)
 
                # 清零
                optimizer.zero_grad()
                # 只有训练的时候计算和更新梯度
                with torch.set_grad_enabled(phase == 'train'):
                    if is_inception and phase == 'train':
                        outputs, aux_outputs = model(inputs)
                        loss1 = criterion(outputs, labels)
                        loss2 = criterion(aux_outputs, labels)
                        loss = loss1 + 0.4*loss2
                    else:#resnet执行的是这里
                        outputs = model(inputs)
                        loss = criterion(outputs, labels)
 
                    _, preds = torch.max(outputs, 1)
 
                    # 训练阶段更新权重
                    if phase == 'train':
                        loss.backward()
                        optimizer.step()
 
                # 计算损失
                running_loss += loss.item() * inputs.size(0)
                running_corrects += torch.sum(preds == labels.data)
 
            epoch_loss = running_loss / len(dataloaders[phase].dataset)
            epoch_acc = running_corrects.double() / len(dataloaders[phase].dataset)
            
            
            time_elapsed = time.time() - since
            print('Time elapsed {:.0f}m {:.0f}s'.format(time_elapsed // 60, time_elapsed % 60))
            print('{} Loss: {:.4f} Acc: {:.4f}'.format(phase, epoch_loss, epoch_acc))
            
 
            # 得到最好那次的模型
            if phase == 'valid' and epoch_acc > best_acc:
                best_acc = epoch_acc
                best_model_wts = copy.deepcopy(model.state_dict())
                state = {
                  'state_dict': model.state_dict(),
                  'best_acc': best_acc,
                  'optimizer' : optimizer.state_dict(),
                }
                torch.save(state, filename)
            if phase == 'valid':
                val_acc_history.append(epoch_acc)
                valid_losses.append(epoch_loss)
                scheduler.step(epoch_loss)
            if phase == 'train':
                train_acc_history.append(epoch_acc)
                train_losses.append(epoch_loss)
        
        print('Optimizer learning rate : {:.7f}'.format(optimizer.param_groups[0]['lr']))
        LRs.append(optimizer.param_groups[0]['lr'])
        print()
 
    time_elapsed = time.time() - since
    print('Training complete in {:.0f}m {:.0f}s'.format(time_elapsed // 60, time_elapsed % 60))
    print('Best val Acc: {:4f}'.format(best_acc))
 
    # 训练完后用最好的一次当做模型最终的结果
    model.load_state_dict(best_model_wts)
    return model, val_acc_history, train_acc_history, valid_losses, train_losses, LRs 

Code reference: PyTorch image recognition in practice_I am Xiaobai's blog-CSDN blog_pytorch recognition

Guess you like

Origin blog.csdn.net/weixin_58176527/article/details/125530000