[OUC Deep Learning Introduction] Week 2 Learning Record: Convolutional Neural Network Basics

Table of contents

Part1 video learning

1 Traditional neural network vs convolutional neural network

2 Basic structure

3 Typical structure of convolutional neural network

Part2 code exercise

1 MNIST dataset classification

2 CIFAR10 dataset classification

3 Classify CIFAR10 using VGG16

Part3 problem thinking

1 What is the difference between different values ​​of shuffle in dataloader?

2 In transform, different values ​​are taken, what is the difference?

3 What is the difference between epoch and batch?

What is the difference between 4 1x1 convolution and FC? What is the main role?

5 Why can residual leanring improve accuracy?

6 In the second code exercise, what is the difference between the network and LeNet proposed by Lecun in 1989?

7 In the second code exercise, the size of the feature map will become smaller after convolution. How to apply Residual Learning?

8 Is there any way to further improve the accuracy?


Part1 video learning

1 Traditional neural network vs convolutional neural network

Basic applications of convolutional neural networks: classification, retrieval, detection, segmentation, recognition, image generation, style transfer, autonomous driving, etc.

1.1 Deep Learning Trilogy

  1. Build a neural network
  2. Select the appropriate loss function: cross entropy loss (cross entropy loss), mean square error (MSE), etc.
  3. Choose an appropriate optimization function to update the parameters: backpropagation (BP), stochastic gradient descent (SGD), etc.

1.2 Loss function

The loss function is used to measure the degree of agreement between the predicted results and the real results, and can help the convolutional neural network adjust parameters/weights W to achieve better training results

1.3 Comparison between the two

The traditional neural network kernel convolutional neural network adopts a hierarchical structure, but the traditional neural network is a fully connected network, and almost every neuron is connected to all the pixel information of the picture, which makes the weight matrix have too many parameters and is easy to overfit. ; while the convolutional network solves this problem through local association and parameter sharing, each of its neurons is only connected to a certain area in the picture, and the parameters of the convolution kernel remain unchanged during the sliding process, which further reduces the parameter size

2 Basic structure

2.1 Convolution

One-dimensional convolution : used in signal processing to calculate the delayed accumulation of signals

Convolution : Convolution is a mathematical operation on two real variable functions. In image processing, images are input to the neural network in two-dimensional form, so two-dimensional convolution is required

Convolution related concepts

  • Basic form: y=Wx+b, where x is the given image and W is the filter
  • Convolution kernel/filter (kernel/filter)
  • weight
  • Receptive field (receptive field): the size of the area corresponding to a convolution operation
  • Feature map (activation map/feature map): The map obtained after the convolution operation, the size is related to the size of the convolution kernel, step size, and map (N+padding*2-F)/stride+1
  • Filling (padding): padding 0 around the image, so that the convolution operation can take into account the edge of the image
  • Step size (stride): the length of the convolution kernel sliding once
  • Depth (depth/channel)

Visualization of convolution : output the feature map of a certain layer, and observe what features the layer has learned

2.2 Pooling

Pooling : The structure and operation are similar to convolution. It is generally located between the convolutional layer and the convolutional layer, or between the fully connected layer and the fully connected layer. It reduces parameters and calculations while retaining the main features. Overfitting to improve the generalization ability of the model

type of pooling

  • Max pooling (max pooling): more commonly used in classification recognition tasks
  • average pooling

2.3 Full connection

Fully connected layer (FC layer) : Usually at the end of the convolutional neural network, all neurons between the two layers have weight connections, and the number of parameters is large

3 Typical structure of convolutional neural network

3.1 AlexNet

Model structure : CONV1 + MAXPOOL1 +NORM1+ CONV2 + MAXPOOL2 +NORM2+ CONV3 + CONV4 + CONV5 + MAXPOOL3 +FC6+FC7+FC8

 Model Features

  • Big Data Training: ImageNet
  • Non-linear activation function: ReLU, which solves the problem of disappearing gradients in positive intervals, with fast calculation speed and faster convergence speed than sigmoid
  • Prevent overfitting: Dropout (random deactivation) + Data augmentation (data enhancement, such as translation, flipping, and Gaussian perturbation)
  • Dual GPU implementation

layer by layer analysis

  1. Convolution-ReLU-Pooling
  2. Convolution-ReLU-Pooling
  3. Convolution-ReLU
  4. Convolution-ReLU
  5. Convolution-ReLU-Pooling
  6. Full connection-ReLU-Dropout
  7. Full connection-ReLU-Dropout
  8. Fully Connected-SoftMax

3.2 ZFNet

The network structure is the same as AlexNet, the receptive field size in convolutional layer 1 is changed from 11*11 to 7*7, the step size is changed from 4 to 2; the number of filters in convolutional layers 3, 4, and 5 is changed from 384 , 384, 256 changed to 512, 512, 1024

3.3 VGG

VGG is a deeper network, AlexNet has 8 layers, VGG has 16-19 layers, VGG is commonly used in transfer learning

Network structure:

 Node information of the 16-layer network:

  • 01:Convolution using 64 filters
  • 02: Convolution using 64 filters + Max pooling
  • 03: Convolution using 128 filters
  • 04: Convolution using 128 filters + Max pooling
  • 05: Convolution using 256 filters
  • 06: Convolution using 256 filters
  • 07: Convolution using 256 filters + Max pooling
  • 08: Convolution using 512 filters
  • 09: Convolution using 512 filters
  • 10: Convolution using 512 filters + Max pooling
  • 11: Convolution using 512 filters
  • 12: Convolution using 512 filters
  • 13: Convolution using 512 filters + Max pooling
  • 14: Fully connected with 4096 nodes
  • 15: Fully connected with 4096 nodes
  • 16: Softmax

 

3.4 GoogleNet

Overall network structure : Contains 22 layers with parameters (considering the pooling layer is 27 layers), a total of about 100 independent block layers, the number of parameters is about half of AlexNet, and there is no fully connected layer

The role of the Inception module : multi-convolution kernels increase feature diversity

Inception V2 : Insert 1*1 convolution for dimensionality reduction, which solves the problem that the number of depth deepening parameters grows too fast

Inception V3 : Replace the large convolution kernel with a small convolution kernel, and further reduce the number of parameters; at the same time, increasing the nonlinear activation function makes the network generate more independent features, enhances the representation ability, and trains faster

 Stem part (stem network) : convolution-pooling-convolution-convolution-pooling

3.5 ResNet

Residual learning network (deep residual learning network) , there is no other fully connected layer except the output layer, the structure is flexible, and it can train very deep networks

 The idea of ​​​​residuals : remove the same main part and highlight small changes

Part2 code exercise

1 MNIST dataset classification

Code link: (colab)MNIST dataset classification

Deep Convolutional Neural Networks have the following properties:

  • Many layers: compositionality
  • 卷积: locality + stationarity of images
  • 池化: Invariance of object class to translations

1.1 Loading MNIST data

PyTorch contains common data sets such as MNIST and CIFAR10. Call torchvision.datasets to download these data from remote to local. Take the use of MNIST as an example:

torchvision.datasets.MNIST(root, train=True, transform=None, target_transform=None, download=False)

  • root: The root directory after the data set is downloaded locally, including training.pt and test.pt files
  • train: If set to True, create dataset from training.pt, otherwise from test.pt
  • download: If set to True, download data from the Internet and put it in the root folder
  • transform: A function or transformation that inputs a PIL image and returns the transformed data
  • target_transform: a function or transform, input target, transform

DataLoader is a relatively important class, and the common operations provided are:

  • batch_size: the size of each batch
  • shuffle: Whether to perform operations that randomly shuffle the order
  • num_workers: Use several subprocesses when loading data
import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
from torchvision import datasets,transforms
import matplotlib.pyplot as plt
import numpy

# 计算模型中有多少参数
def get_n_params(model):
    np=0
    for p in list(model.parameters()):
        np += p.nelement()
    return np

# 使用GPU训练
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
print(device)
# out: cude:0

input_size = 28*28  # MNIST上的图像尺寸
output_size = 10  # 类别为0到9的数字

train_loader = torch.utils.data.DataLoader(datasets.MNIST('./data',train=True,download=True,
                      transform=transforms.Compose([transforms.ToTensor(),transforms.Normalize((0.1307,),(0.3081,))])),
                      batch_size=64,shuffle=True)

test_loader = torch.utils.data.DataLoader(datasets.MNIST('./data',train=False,
                      transform=transforms.Compose([transforms.ToTensor(),transforms.Normalize((0.1307,),(0.3081,))])),
                      batch_size=1000,shuffle=True)

# 显示数据集中的部分图像
plt.figure(figsize=(8,5))
for i in range(20):
    plt.subplot(4,5,i+1)
    image,_ = train_loader.dataset.__getitem__(i)
    plt.imshow(image.squeeze().numpy(), 'gray')
    plt.axis('off');

1.2 Create a network

When defining a network, you need to inherit nn.Module and implement its forward method. Put the layer with learnable parameters in the network in the constructor init. As long as the forward function is defined in the subclass of nn.Module, the backward function will be Will be implemented automatically (using autograd)

# 网络结构

class FC2Layer(nn.Module):
  def __init__(self,input_size,n_hidden,output_size):
    # nn.Module子类的函数必须在构造函数中执行父类的构造函数
    # 下式等价于nn.Module.__init__(self)        
    super(FC2Layer,self).__init__()
    self.input_size = input_size
    # 这里直接用Sequential定义网络,注意要和下面CNN的代码区分开
    self.network = nn.Sequential(
        nn.Linear(input_size,n_hidden), 
        nn.ReLU(), 
        nn.Linear(n_hidden,n_hidden), 
        nn.ReLU(), 
        nn.Linear(n_hidden,output_size), 
        nn.LogSoftmax(dim=1)
    )
  # forward函数用于指定网络的运行过程
  def forward(self,x):
    # view一般出现在model类的forward函数中,用于改变输入或输出的形状
    # 代码指定二维数据的列数为input_size=784,行数-1表示由电脑自己计算对应的数字
    # batch_size是64,所以x的行数是64
    x = x.view(-1,self.input_size) # 多维的数据展成二维
    # print(x.cpu().numpy().shape)  # 输出(64,784)
    return self.network(x)
    


class CNN(nn.Module):
  def __init__(self,input_size,n_feature,output_size):
    # 执行父类的构造函数
    super(CNN,self).__init__()
    # 池化、ReLU一类的不用在这里定义
    self.n_feature = n_feature
    self.conv1 = nn.Conv2d(in_channels=1,out_channels=n_feature,kernel_size=5)
    self.conv2 = nn.Conv2d(n_feature,n_feature,kernel_size=5)
    self.fc1 = nn.Linear(n_feature*4*4,50)
    self.fc2 = nn.Linear(50,10)    

  # 下面的forward函数定义了网络的结构
  # conv1,conv2等等可以多次重用
  def forward(self,x,verbose=False):
    x = self.conv1(x)
    x = F.relu(x)
    x = F.max_pool2d(x,kernel_size=2)
    x = self.conv2(x)
    x = F.relu(x)
    x = F.max_pool2d(x,kernel_size=2)
    x = x.view(-1,self.n_feature*4*4)
    x = self.fc1(x)
    x = F.relu(x)
    x = self.fc2(x)
    x = F.log_softmax(x,dim=1)
    return x

1.3 Training on a small fully connected network

Train and test functions:

# 训练函数
def train(model):
  model.train()
  # 从train_loader里,64个样本一个batch为单位提取样本进行训练
  for batch_idx,(data,target) in enumerate(train_loader):
    # 把数据送到GPU中
    data,target = data.to(device),target.to(device)

    optimizer.zero_grad()
    output = model(data)
    loss = F.nll_loss(output,target)
    loss.backward()
    optimizer.step()
    if batch_idx%100==0:
      print('Train: [{}/{} ({:.0f}%)]\tLoss: {:.6f}'.format(
        batch_idx*len(data),len(train_loader.dataset),
        100.*batch_idx/len(train_loader),loss.item()))

# 测试函数
def test(model):
  model.eval()
  test_loss = 0
  correct = 0
  for data, target in test_loader:
    # 把数据送到GPU中
    data,target = data.to(device),target.to(device)
    # 把数据送入模型,得到预测结果
    output = model(data)
    # 计算本次batch的损失,并加到test_loss中
    test_loss += F.nll_loss(output,target,reduction='sum').item()
    # 值最大的那个即对应着分类结果,然后把分类结果保存在pred里
    pred = output.data.max(1,keepdim=True)[1]
    # 将pred与target相比,得到正确预测结果的数量,并加到correct中
    # view_as:把target变成维度和pred一样                                                
    correct += pred.eq(target.data.view_as(pred)).cpu().sum().item()

  test_loss /= len(test_loader.dataset)
  accuracy = 100.*correct/len(test_loader.dataset)
  print('\nTest set: Average loss: {:.4f}, Accuracy: {}/{} ({:.0f}%)\n'.format(
      test_loss,correct,len(test_loader.dataset),
      accuracy))
# 全连接层网络训练

n_hidden = 8 # number of hidden units

model_fnn = FC2Layer(input_size,n_hidden,output_size)
model_fnn.to(device)
optimizer = optim.SGD(model_fnn.parameters(),lr=0.01,momentum=0.5)
print('Number of parameters: {}'.format(get_n_params(model_fnn)))

train(model_fnn)
test(model_fnn)

1.4 Training on Convolutional Neural Networks

# 卷积神经网络训练

# Training settings 
n_features = 6 # number of feature maps

model_cnn = CNN(input_size,n_features,output_size)
model_cnn.to(device)
optimizer = optim.SGD(model_cnn.parameters(),lr=0.01,momentum=0.5)
print('Number of parameters: {}'.format(get_n_params(model_cnn)))

train(model_cnn)
test(model_cnn)

It can be seen that when the number of parameters is similar, the effect of CNN is better than that of a simple fully connected network, because CNN can better extract information through convolution kernel pooling

1.5 Shuffle the pixel order and train and test again on the two networks

Both convolution and pooling operations are performed locally on the image, and the positional relationship of pixels can be extracted. At this time, try to scramble the order of the pixels in the image

# 打乱像素顺序的演示

perm = torch.randperm(784)  # 给定参数n,返回一个从0到n-1的随机整数排列
plt.figure(figsize=(8,4))
for i in range(10):
    image,_ = train_loader.dataset.__getitem__(i)
    # permute pixels
    image_perm = image.view(-1,28*28).clone()
    image_perm = image_perm[:,perm]
    image_perm = image_perm.view(-1,1,28,28)
    plt.subplot(4,5,i+1)
    plt.imshow(image.squeeze().numpy(), 'gray')
    plt.axis('off')
    plt.subplot(4,5,i+11)
    plt.imshow(image_perm.squeeze().numpy(),'gray')
    plt.axis('off')

The effect after shuffling the order of pixels:

Function to shuffle pixels:

# 对每个batch里的数据,打乱像素顺序的函数
def perm_pixel(data,perm):
  # 转化为二维矩阵
  data_new = data.view(-1,28*28)
  # 打乱像素顺序
  data_new = data_new[:,perm]
  # 恢复为原来4维的tensor
  data_new = data_new.view(-1,1,28,28)
  return data_new

Train and test on a fully connected network:

# 打乱像素顺序训练全连接网络

perm = torch.randperm(784)
n_hidden = 8  # number of hidden units

model_fnn = FC2Layer(input_size,n_hidden,output_size)
model_fnn.to(device)
optimizer = optim.SGD(model_fnn.parameters(),lr=0.01,momentum=0.5)
print('Number of parameters: {}'.format(get_n_params(model_fnn)))

train_perm(model_fnn,perm)
test_perm(model_fnn,perm)

Train and test on a convolutional neural network:

# 打乱像素顺序训练卷积神经网络

perm = torch.randperm(784)
n_features = 6  # number of feature maps

model_cnn = CNN(input_size,n_features,output_size)
model_cnn.to(device)
optimizer = optim.SGD(model_cnn.parameters(),lr=0.01,momentum=0.5)
print('Number of parameters: {}'.format(get_n_params(model_cnn)))

train_perm(model_cnn,perm)
test_perm(model_cnn,perm)

It can be seen that after the pixel order is disrupted, the performance of the convolutional neural network decreases, and the local relationship between pixels is very important training information for the convolutional neural network.

2 CIFAR10 dataset classification

Code link: (colab) CIFAR10 dataset classification

The CIFAR10 dataset contains 10 categories, and the image size is 3*32*32. It can be loaded using torchsivion. The output range of the torchvision dataset is [0,1] PILImage. Before using it, it needs to be normalized and converted to [ -1,1] tensor

import torch
import torchvision
import torchvision.transforms as transforms
import matplotlib.pyplot as plt
import numpy as np
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim

# 使用GPU训练
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")

transform = transforms.Compose([transforms.ToTensor(),
                transforms.Normalize((0.5,0.5,0.5),(0.5,0.5,0.5))])

# 训练的shuffle是True,打乱顺序增加样本多样性,测试的shuffle是false
trainset = torchvision.datasets.CIFAR10(root='./data',train=True,download=True,transform=transform)
trainloader = torch.utils.data.DataLoader(trainset,batch_size=64,shuffle=True,num_workers=2)

testset = torchvision.datasets.CIFAR10(root='./data',train=False,download=True,transform=transform)
testloader = torch.utils.data.DataLoader(testset,batch_size=8,shuffle=False,num_workers=2)

classes = ('plane','car','bird','cat','deer','dog','frog','horse','ship','truck')

def imshow(img):
  plt.figure(figsize=(8,8))
  img = img/2+0.5 # 转换为[0,1]
  npimg = img.numpy()
  plt.imshow(np.transpose(npimg,(1,2,0)))
  plt.show()

# 得到一组图像
images,labels = iter(trainloader).next()
# 展示图像
imshow(torchvision.utils.make_grid(images))
# 展示第一行图像的标签
for j in range(8):
  print(classes[labels[j]])

2.1 Define the network, loss function and optimizer

class Net(nn.Module):
  def __init__(self):
    super(Net,self).__init__()
    self.conv1 = nn.Conv2d(3,6,5)
    self.pool = nn.MaxPool2d(2,2)
    self.conv2 = nn.Conv2d(6,16,5)
    self.fc1 = nn.Linear(16*5*5,120)
    self.fc2 = nn.Linear(120,84)
    self.fc3 = nn.Linear(84,10)

  def forward(self,x):
    x = self.pool(F.relu(self.conv1(x)))
    x = self.pool(F.relu(self.conv2(x)))
    x = x.view(-1,16*5*5)
    x = F.relu(self.fc1(x))
    x = F.relu(self.fc2(x))
    x = self.fc3(x)
    return x

# 在GPU上训练
net = Net().to(device)
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(net.parameters(),lr=0.001)

2.2 Training Network

for epoch in range(10):  # 重复多轮训练
    for i,(inputs,labels) in enumerate(trainloader):
        inputs = inputs.to(device)
        labels = labels.to(device)
        # 优化器梯度归零
        optimizer.zero_grad()
        # 正向传播+反向传播+优化 
        outputs = net(inputs)
        loss = criterion(outputs,labels)
        loss.backward()
        optimizer.step()
        # 输出统计信息
        if i%200==0:   
            print('Epoch: %d Minibatch: %5d loss: %.3f' %(epoch+1,i+1,loss.item()))

print('Finished Training')

 Training result:

2.3 Observing the recognition effect

# 得到一组图像
images,labels = iter(testloader).next()
# 展示图像
imshow(torchvision.utils.make_grid(images))
# 展示图像的标签
for j in range(8):
    print(classes[labels[j]])

outputs = net(images.to(device))
_,predicted = torch.max(outputs,1)

print("预测结果:")

# 展示预测的结果
for j in range(8):
    print(classes[predicted[j]])

 

 A small number of recognition errors were found

2.4 Statistical Overall Accuracy

 The accuracy rate is low and needs to be improved

3 Classify CIFAR10 using VGG16

Code link: (colab)VGG_CIFAR10

3.1 Define dataloader

Compared with code exercise 2, the parameters of the normalization operation here have some changes

import torch
import torchvision
import torchvision.transforms as transforms
import matplotlib.pyplot as plt
import numpy as np
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim

# 使用GPU训练
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")

transform_train = transforms.Compose([
    transforms.RandomCrop(32, padding=4),
    transforms.RandomHorizontalFlip(),
    transforms.ToTensor(),
    transforms.Normalize((0.4914,0.4822,0.4465),(0.2023,0.1994,0.2010))])

transform_test = transforms.Compose([
    transforms.ToTensor(),
    transforms.Normalize((0.4914,0.4822,0.4465),(0.2023,0.1994,0.2010))])

trainset = torchvision.datasets.CIFAR10(root='./data',train=True,download=True,transform=transform_train)
testset = torchvision.datasets.CIFAR10(root='./data',train=False,download=True,transform=transform_test)

trainloader = torch.utils.data.DataLoader(trainset,batch_size=128,shuffle=True,num_workers=2)
testloader = torch.utils.data.DataLoader(testset,batch_size=128,shuffle=False,num_workers=2)

classes = ('plane','car','bird','cat','deer','dog','frog','horse','ship','truck')

3.2 VGG network definition

A simple VGG network is defined here with the following structure:

  • 64 conv, maxpooling

  • 128 conv, maxpooling

  • 256 conv, 256 conv, maxpooling

  • 512 conv, 512 conv, maxpooling

  • 512 conv, 512 conv, maxpooling

  • softmax

# 简化版的VGG

class VGG(nn.Module):
  def __init__(self):
    super(VGG,self).__init__()
    self.cfg = [64,'M',128,'M',256,256,'M',512,512,'M',512,512,'M']
    self.features = self._make_layers(self.cfg)
    self.classifier = nn.Linear(2048,10)  # 根据分类任务的类别数量确定

  def forward(self,x):
    out = self.features(x)
    out = out.view(out.size(0),-1)
    out = self.classifier(out)
    return out

  def _make_layers(self,cfg):
    layers = []
    in_channels = 3
    for x in cfg:
      if x=='M':
        layers += [nn.MaxPool2d(kernel_size=2,stride=2)]
      else:
        layers += [nn.Conv2d(in_channels,x,kernel_size=3,padding=1),nn.BatchNorm2d(x),nn.ReLU(inplace=True)]
        in_channels = x
    layers += [nn.AvgPool2d(kernel_size=1,stride=1)]
    return nn.Sequential(*layers)


# 网络放到GPU上
net = VGG().to(device)
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(net.parameters(),lr=0.001)

3.3 Network training

for epoch in range(10):  # 重复多轮训练
  for i,(inputs,labels) in enumerate(trainloader):
    inputs = inputs.to(device)
    labels = labels.to(device)
    # 优化器梯度归零
    optimizer.zero_grad()
    # 正向传播+反向传播+优化 
    outputs = net(inputs)
    loss = criterion(outputs,labels)
    loss.backward()
    optimizer.step()
    # 输出训练信息
    if i%100==0:   
      print('Epoch: %d Minibatch: %5d loss: %.3f' %(epoch+1,i+1,loss.item()))

print('Finished Training')

The loss of the training process:

3.4 Test verification accuracy

 Compared with the accuracy rate obtained by using CNN training in code exercise 2, the accuracy rate obtained by using VGG has increased by 20 points. VGG is deeper than CNN in code exercise 2, and there are more possibilities for nonlinear transformation, which is more conducive to VGG to find more accurate results. Fitting nonlinear transformations to solve data classification problems

Part3 problem thinking

1 What is the difference between different values ​​of shuffle in dataloader?

Shuffle is a parameter of type bool. When shuffle is True, the data will be scrambled when loading the data set data. When shuffle is False, it will not be scrambled. Scrambling the order will make the data sequence in each round of training different, eliminating The effect of data arrangement on training effect

2 In transform, different values ​​are taken, what is the difference?

The transform here defines some commonly used data preprocessing operations, including data normalization, random cropping, flipping, etc., which can be used for data enhancement, make full use of data samples, and improve the generalization ability of the training model

The transforms.normalize() used in the code exercise is used to normalize the image channel by channel, so that the data obeys the distribution with a mean of 0 and a standard deviation of 1, which speeds up the convergence of the model. The basic realization formula is x=(x-mean )/std, where mean is the mean of the data itself, and std is the standard deviation of the data itself, these two values ​​need to be calculated in advance

Before normalize, the data is in [0,1], when normalize, if it is normalize((0.5,0.5,0.5),(0.5,0.5,0.5)), it is to normalize the data, if it is normalize(mean, std), is to process the data as a distribution with a mean of 0 and a standard deviation of 1

3 What is the difference between epoch and batch?

Epoch is the number of rounds of the training data set. One epoch is equivalent to running all the data sets once; bantch is the number of samples for batch training in one epoch.

What is the difference between 4 1x1 convolution and FC? What is the main role?

1*1 convolution is a special case of two-dimensional convolution, which can play a role in dimensionality reduction. When dealing with multi-dimensional input, it can adjust the number of channels, reduce parameters, help capture patterns in depth, and increase Nonlinear, instead of FC as a classifier. Compared with FC, 1*1 convolution can achieve weight sharing, and the amount of parameters is less than that of the fc layer with the same function. It uses position information, and the fc layer requires a uniform size for the training samples, but the 1*1 convolution not subject to this provision

5 Why can residual leanring improve accuracy?

In the process of training the network, the deeper the depth, the more complex the parameters, the more complex the network. However, the classification task is process-unknown, and it is difficult for the deep neural network to achieve identity mapping, which makes it difficult for the network to learn a better Parameters, after introducing residual learning, the network can realize identity mapping. During the training process, several layers can be skipped according to the actual effect, which is more flexible, so the accuracy rate can be improved.

6 In the second code exercise, what is the difference between the network and LeNet proposed by Lecun in 1989?

The structure of LeNet is roughly as follows:

In code exercise 2, the maximum pooling and ReLU activation functions are used, while LeNet uses average pooling, and the activation function is sigmoid

7 In the second code exercise, the size of the feature map will become smaller after convolution. How to apply Residual Learning?

You can refer to the BottleNeck design of the ResNet network with more than 50 layers, and use 1*1 convolution to adjust the dimension and feature map size

8 Is there any way to further improve the accuracy?

  1. Add Dropout operation
  2. Try different network structures and adjust the number of layers appropriately
  3. Try different activation functions
  4. Use L1/L2 regularization
  5. Use pre-training, use the parameters obtained from pre-training for formal training
  6. Try different optimizers and loss functions, try tuning hyperparameters

Guess you like

Origin blog.csdn.net/qq_55708326/article/details/125831268