Deep Learning - Getting Started

Difference Between Deep Learning and Machine Learning

Deep learning is developed from the neural network in machine learning. Machine learning is mostly used to process numerical data, while deep learning can also process data such as pictures and audio.

feature extraction 

  • The feature engineering step of machine learning needs to be done manually and requires a lot of professional domain knowledge.
  • Deep learning usually consists of multiple layers, which usually combine simpler models, pass data from one layer to another to build more complex models, and automatically derive models by training large amounts of data without manual feature extraction .

Data Volume and Computational Performance Requirements 

The execution time required by machine learning is much less than that of deep learning. The parameters of deep learning are often very large, and the parameters need to be trained through multiple optimizations of a large amount of data.

algorithm representative 

  • machine learning

           Naive Bayes, decision trees, etc.

  • deep learning

           Neural Networks

Neural Networks

Artificial Neural Network ( ANN for short ) is also called Neural Network (NN) for short . It is a computational model that mimics the structure and function of biological neural networks (the central nervous system of animals, especially the brain). The classic neural network structure includes three layers of neural network, which are input layer, output layer and hidden layer.

The circle in each layer represents a neuron (formerly known as a perceptron ), the neurons in the hidden layer and the output layer are output after calculation from the input data, and the neurons in the input layer are only input.

Features of Neural Networks 

  • Every link has weight
  • There are no connections between neurons in the same layer
  • The layer corresponding to the final output result is also called the fully connected layer

Perceptron (PLA: Perceptron Learning Algorithm)

The perceptron is the process of simulating the processing of data by the neural network of the brain.

The perceptron is the most basic classification model, similar to logistic regression. The difference is that the activation function of the perceptron uses sign, while the logistic regression uses sigmoid. Perceptrons also have connected weights and biases.

u=\sum_{i=1}^{n}w_{i}x_{i}+b

y=sign(u)=\left\{\begin{matrix} +1, u>0 & \\ -1, u\leqslant 0& \end{matrix}\right.

softmax regression 

Softmax regression converts neural network output into probabilistic results.

softmax(y)_{i}=\frac{e^{y_{i}}}{\sum_{j=1}^{n}e^{y_{i}}}

cross entropy loss 

The final loss of the neural network is the average loss size of each sample.

Code examples for several loss functions:

import torch
from torch.nn import L1Loss
from torch import nn

inputs = torch.tensor([1, 2, 3], dtype=torch.float32)
targets = torch.tensor([1, 2, 5], dtype=torch.float32)

inputs = torch.reshape(inputs, (1, 1, 1, 3))
targets = torch.reshape(targets, (1, 1, 1, 3))

#L1Loss平均绝对误差,reduction默认是mean求均值,sum则直接累加
loss = L1Loss(reduction='sum')#(1-1)+(2-2)+(5-3)=2
result = loss(inputs, targets)

loss_mse = nn.MSELoss()#平方损失函数,(1-1)^2+(2-2)^2+(5-3)^2=4/3=1.3333
result_mse = loss_mse(inputs, targets)

print(result)
print(result_mse)


x = torch.tensor([0.1, 0.2, 0.3])
y = torch.tensor([1])
x = torch.reshape(x, (1, 3))
loss_cross = nn.CrossEntropyLoss()#交叉熵损失
result_cross = loss_cross(x, y)
print(result_cross)

convolutional neural network 

Traditionally, a multi-layer neural network has only an input layer, a hidden layer, and an output layer. The number of hidden layers depends on the needs, and there is no clear theoretical derivation to explain how many layers are suitable.

The convolutional neural network CNN, on the basis of the original multi-layer neural network, has added a more effective feature learning part. The specific operation is to add a convolutional layer and a pooling layer in front of the original fully connected layer. The emergence of convolutional neural networks has deepened the number of layers of neural networks, and "deep" learning comes from this.

Structure of Convolutional Neural Networks

The basic composition of a neural network includes an input layer, a hidden layer, and an output layer, and a convolutional neural network is characterized by a hidden layer that is divided into a convolutional layer, a pooling layer (also called a downsampling layer) and an activation layer.

  • Convolutional layers: extract features by translating over the original image
  • Activation layer: increase non-linear segmentation capability
  • Pooling layer: reduce the parameters of learning and reduce the complexity of the network (maximum pooling and average pooling)

In order to achieve the classification effect, there will also be a full connection layer (Full Connection), which is the final output layer, to perform loss calculations and output classification results.

Convolutional Layer

Each convolutional layer in the convolutional neural network is composed of several convolutional units (convolution kernels), and the parameters of each convolutional unit are optimized through the backpropagation algorithm.

The purpose of convolution operation is feature extraction. The first convolutional layer may only extract some low-level features such as edges, lines, and corners. More layers of networks can iteratively extract more complex features from low-level features.

The four elements of the convolution kernel (Filter can also be called a filter)

  • Convolution kernel size
  • Convolution kernel step size
  • Number of convolution kernels
  • Convolution kernel zero padding size

When the number of channels is 1, it is generally called a convolution kernel, and when the number of channels is greater than 1, it is generally called a filter.

Calculation of convolution kernel

We can understand the convolution kernel as an observer who observes with several weights and a bias to perform feature weighting operations.

Note: The commonly used sizes of convolution kernels are 1*1, 3*3, 5*5.

We need to pan to observe the picture, and the required parameter is the step size .

Assuming that the step size of the movement is one pixel, then the final observation result of this person is as follows:

If the moving step size is 2, then the result is as follows:

If in a certain layer structure, not only one person is observing, but multiple people ( convolution kernels ) observe together, then multiple observation results will be obtained.

  • The weights and biases of different convolution kernels are different, that is, the parameters of random initialization 

In addition, the size of the output result is determined by the size and step size, and there is also zero padding . Because the size of the filter observation window and the moving step sometimes cause the pixel width of the image to be exceeded.

Zero padding is to fill pixels with a value of 0 around the pixels of the picture, and the number of circles to fill depends on the actual situation.

Calculation code example of convolution kernel: 

import torch
import torch.nn.functional as F

# Input表示输入的图像颜色值
Input = torch.tensor([[23, 12, 220, 43, 2],
                      [45, 57, 25, 122, 91],
                      [189, 15, 149, 222, 76],
                      [12, 35, 57, 29, 20],
                      [4, 34, 65, 8, 18]])
# kernel表示卷积核
kernel = torch.tensor([[2.1, 0.7, 1.0],
                       [1.3, 1.2, 3.0],
                       [2.8, 0.3, 0.9]])
# 要转换成conv卷积接受的参数的维度形式,batch_size可以看作是一次训练有几张图片,这里一个矩阵故为1,而这里是灰度图片故channel也为1
Input = torch.reshape(Input, (1, 1, 5, 5)).float()  # 默认是long要转成float,变成batch_size=1,channel=1,高H=5,宽W=5
kernel = torch.reshape(kernel, (1, 1, 3, 3)).float()

output = F.conv2d(Input, kernel, stride=1)  # conv2d表示二维卷积(图像就是二维矩阵),stride=1表示步长为1
print(output)

output2 = F.conv2d(Input, kernel, stride=2)  # 表示步长为2(默认是为1)
print(output2)

output3 = F.conv2d(Input, kernel, stride=1, padding=1)  # 表示步长为1且零填充了一圈(padding默认为0)
print(output3)

Neural network code example:

import torch
import torchvision
from torch import nn
from torch.nn import Conv2d
from torch.utils.data import DataLoader
from torch.utils.tensorboard import SummaryWriter

# 下载CIFAR10数据集中的测试集(因为训练集太大了),转化为tensor格式存在同路径的pytorch_data文件下
dataset = torchvision.datasets.CIFAR10("pytorch_data", train=False, transform=torchvision.transforms.ToTensor(),
                                       download=True)
dataloader = DataLoader(dataset, batch_size=64)  # 设置一组图片的数量为64


class Mynn(nn.Module):
    def __init__(self):  # 构造函数
        super(Mynn, self).__init__()  # 调用父类构造函数进行初始化
        self.conv1 = Conv2d(in_channels=3, out_channels=6, kernel_size=3, stride=1, padding=0)

    def forward(self, x):  # 前向传播
        x = self.conv1(x)
        return x


mynn = Mynn()

writer = SummaryWriter("logs")  # 将训练的中间过程的结果可视化(存在文件logs)

step = 0
for data in dataloader:
    imgs, targets = data
    output = mynn(imgs)
    # print(imgs.shape)
    # print(output.shape)

    # torch.Size([64, 3, 32, 32])
    writer.add_images("input", imgs, step)
    # torch.Size([64, 6, 30, 30])  -> [xxx, 3, 30, 30]
    output = torch.reshape(output, (-1, 3, 30, 30))  # batch_size根据后面channel、H、W的值推出,故这里的-1是占位符
    writer.add_images("output", output, step)

    step = step + 1

writer.close()

Enter tensorboard --logdir=logs in the directory terminal (here the file name is logs so write=logs) , and then open http://localhost:6006/  to view the visualization.

Here, in order to speed up, torchboard will only display the picture results of some groups (steps) by default, which can be rewritten as tensorboard --logdir=logs --samples_per_plugin=images=1000 to display all pictures, 1000 means the number of pictures displayed, It is better to write bigger.

After opening, you can see the result after convolution conv processing: (You can see that the picture is a group of 8*8=64)

Output Size Calculation Formula 

Note: D is the number of channels, such as D1. If the picture is color, that is, RGB type, then the number of channels is fixed at 3, and if it is gray, the number of channels is 1.

Observation of multi-channel pictures 

If it is a color picture, there are three tables R, G, and B. Originally, each person needs to bring a 3*3 or other size convolution kernel, but now they need to bring three 3*3 weights and a bias , a total of 27 weights. In the end, everyone still came up with a result.

通道 channel

The number of channels can be understood as 深度either层数

For example,  a color image is produced by superimposing three layers of images

The number of layers of the filter (3 * 3 * 3) in the figure below is also 3  (width * height * number of channels)

The number of filters = the number of feature maps (the number of feature maps is the number of output images, that is, the number of output channels out_channel) 

The graph generated after convolution is called 特征图 (extracting the features of the original image), and a filter can generate a feature map.

activation function 

With the development of neural networks, we found that the original sigmoid and other activation functions cannot achieve good results, so a new activation function was adopted.

resume

ReluAdvantages

  • Effectively solve the problem of gradient disappearance.
  • The calculation speed is very fast, and the solution speed of SGD (batch gradient descent) is much faster than that of sigmoid and tanh.

sigmoid disadvantages

  • The amount of calculation is large, and the gradient is prone to disappear during backpropagation.

Activation function sample code:

import torch
import torchvision
from torch import nn
from torch.nn import ReLU, Sigmoid
from torch.utils.data import DataLoader
from torch.utils.tensorboard import SummaryWriter

dataset = torchvision.datasets.CIFAR10("pytorch_data", train=False, download=True,
                                       transform=torchvision.transforms.ToTensor())

dataloader = DataLoader(dataset, batch_size=64)


class Mynn(nn.Module):
    def __init__(self):
        super(Mynn, self).__init__()
        # self.relu1 = ReLU()
        self.sigmoid1 = Sigmoid()  # 因为这个数据集的数都是非负的relu没用,所以就用sigmoid示例了

    def forward(self, input):
        output = self.sigmoid1(input)
        return output


mynn = Mynn()

writer = SummaryWriter("logs")
step = 0
for data in dataloader:
    imgs, targets = data
    writer.add_images("input", imgs, global_step=step)
    output = mynn(imgs)
    writer.add_images("output", output, step)
    step += 1

writer.close()

Pooling layer (Polling) 

Note: For example, there are pixels 1, 1, 5, and 6 in the four pink cells (that is, four pixels) on the right, so the maximum value is 6. 

Example code for max pooling:

import torch
import torchvision
from torch import nn
from torch.nn import MaxPool2d
from torch.utils.data import DataLoader
from torch.utils.tensorboard import SummaryWriter

dataset = torchvision.datasets.CIFAR10("./pytorch_data", train=False, download=True,
                                       transform=torchvision.transforms.ToTensor())  # ./也表示同级目录

dataloader = DataLoader(dataset, batch_size=64)


class Mynn(nn.Module):
    def __init__(self):
        super(Mynn, self).__init__()
        # 默认的步长就是卷积核大小,比如这里的默认步长就是3,ceil_mode表示移动窗口超出图片外的时侯是否对图片内窗口包含的矩阵取最大值,默认为false
        self.maxpool1 = MaxPool2d(kernel_size=3, ceil_mode=False)

    def forward(self, input):
        output = self.maxpool1(input)
        return output


mynn = Mynn()

writer = SummaryWriter("./logs")
step = 0

for data in dataloader:
    imgs, targets = data
    writer.add_images("input", imgs, step)
    output = mynn(imgs)
    writer.add_images("output", output, step)
    step = step + 1

writer.close()

It can be seen that after pooling, the picture becomes blurred, similarly, for example, reducing the occupied space of the video from 1080p to 720p.

Full connection layer (Full Connection)

The previous convolution and pooling are equivalent to feature engineering, and the last fully connected layer acts as a "classifier" in the entire neural network.

Model parameter tuning

Optimizer sample code:

import torch
import torchvision
from torch import nn
from torch.nn import Sequential, Conv2d, MaxPool2d, Flatten, Linear
from torch.optim.lr_scheduler import StepLR
from torch.utils.data import DataLoader

dataset = torchvision.datasets.CIFAR10("pytorch_data", train=False, transform=torchvision.transforms.ToTensor(),
                                       download=True)

dataloader = DataLoader(dataset, batch_size=1)


class Mynn(nn.Module):
    def __init__(self):
        super(Mynn, self).__init__()
        self.model1 = Sequential(  # 将一系列网络层放在Sequential序列里,代码更简洁
            Conv2d(3, 32, 5, padding=2),
            MaxPool2d(2),
            Conv2d(32, 32, 5, padding=2),
            MaxPool2d(2),
            Conv2d(32, 64, 5, padding=2),
            MaxPool2d(2),
            Flatten(),  # 将数据展平成一行
            Linear(1024, 64),  # 线性层
            Linear(64, 10)
        )

    def forward(self, x):
        x = self.model1(x)
        return x


loss = nn.CrossEntropyLoss()
mynn = Mynn()
optim = torch.optim.SGD(mynn.parameters(), lr=0.01)  # SGD为随机梯度下降,lr为学习率
for epoch in range(20):  # 表示20次训练
    running_loss = 0.0
    for data in dataloader:
        imgs, targets = data
        outputs = mynn(imgs)
        result_loss = loss(outputs, targets)
        optim.zero_grad()  # 将优化器梯度清零
        result_loss.backward()  # 反向传播,通过损失去调参数的权重以达到更小的损失
        optim.step()  # 将优化器的参数调优
        running_loss = running_loss + result_loss
    print(running_loss)  # 每轮训练后的总损失

Use and modification of existing network models 

Code example for vgg16 classification model:

import torchvision
from torch import nn
from torchvision.models import VGG16_Weights

vgg16_false = torchvision.models.vgg16(weights=None)  # vgg16是分类模型,weights是权重参数
vgg16_true = torchvision.models.vgg16(weights=VGG16_Weights.DEFAULT)  # 表示是已经经过预训练的模型(DEFAULT表示使用默认的权重参数),表现的会比较好点

print(vgg16_true)

train_data = torchvision.datasets.CIFAR10('pytorch_data', train=True, transform=torchvision.transforms.ToTensor(),
                                          download=True)

vgg16_true.classifier.add_module('add_linear', nn.Linear(1000, 10))  # 表示在classifier里添加一个线性层
print(vgg16_true)

print(vgg16_false)
vgg16_false.classifier[6] = nn.Linear(4096, 10)  # 表示将classifier里编号6这个层修改成nn.Linear(4096, 10)
print(vgg16_false)

Note: For example, when writing the parameters of a model like nn.Linear(1000, 10), you can put the cursor in the parentheses, and then press Ctrl+p to view the parameter prompts.

 

Save and load network models 

keep:

import torch
import torchvision
from torch import nn

vgg16 = torchvision.models.vgg16(pretrained=False)
# 保存方式1,模型结构+模型参数
torch.save(vgg16, "vgg16_method1.pth")  # vgg16_method1.pth为保存的文件名,后缀一般为pth

# 保存方式2,模型参数(官方推荐)
torch.save(vgg16.state_dict(), "vgg16_method2.pth")  # state_dict()表示的就是将状态(参数)用字典的形式保存起来

load: 

import torch
import torchvision

# 方式1:保存方式1,加载模型
model = torch.load("vgg16_method1.pth")
# print(model)

# 方式2:加载模型
vgg16 = torchvision.models.vgg16(pretrained=False)
vgg16.load_state_dict(torch.load("vgg16_method2.pth"))
# model = torch.load("vgg16_method2.pth")
# print(vgg16)

Note: If you are loading a saved model written by yourself, remember to import the .py file of the network structure of the model. For example, if the .py file is called model.py, then from model import *.

Using GPU to train the model

Method 1: Add .cuda() after the network model , data and loss function .

mynn = Mynn()
mynn=mynn.cuda()

imgs, targets = data
imgs=imgs.cuda()
targets=targets.cuda()

loss = nn.CrossEntropyLoss()
loss=loss.cuda()

Method Two: 

# 定义训练的设备
device = torch.device("cuda")

mynn = Mynn()
mynn = Mynn.to(device)

loss_fn = nn.CrossEntropyLoss()
loss_fn = loss_fn.to(device)

imgs, targets = data
imgs = imgs.to(device)
targets = targets.to(device)

 

Application of the model

model.py

import torch
from torch import nn

# 搭建神经网络
class Tudui(nn.Module):
    def __init__(self):
        super(Tudui, self).__init__()
        self.model = nn.Sequential(
            nn.Conv2d(3, 32, 5, 1, 2),
            nn.MaxPool2d(2),
            nn.Conv2d(32, 32, 5, 1, 2),
            nn.MaxPool2d(2),
            nn.Conv2d(32, 64, 5, 1, 2),
            nn.MaxPool2d(2),
            nn.Flatten(),
            nn.Linear(64*4*4, 64),
            nn.Linear(64, 10)
        )

    def forward(self, x):
        x = self.model(x)
        return x


if __name__ == '__main__':
    tudui = Tudui()
    input = torch.ones((64, 3, 32, 32))
    output = tudui(input)
    print(output.shape)

train.py

import torchvision
from torch.utils.tensorboard import SummaryWriter
from model import *
from torch import nn
from torch.utils.data import DataLoader

# 准备数据集
train_data = torchvision.datasets.CIFAR10(root="../data", train=True, transform=torchvision.transforms.ToTensor(),
                                          download=True)
test_data = torchvision.datasets.CIFAR10(root="../data", train=False, transform=torchvision.transforms.ToTensor(),
                                         download=True)

# length 长度
train_data_size = len(train_data)
test_data_size = len(test_data)
# 如果train_data_size=10, 训练数据集的长度为:10
print("训练数据集的长度为:{}".format(train_data_size))
print("测试数据集的长度为:{}".format(test_data_size))

# 利用 DataLoader 来加载数据集
train_dataloader = DataLoader(train_data, batch_size=64)
test_dataloader = DataLoader(test_data, batch_size=64)

# 创建网络模型
tudui = Tudui()

# 损失函数
loss_fn = nn.CrossEntropyLoss()

# 优化器
# learning_rate = 0.01
# 1e-2=1 x (10)^(-2) = 1 /100 = 0.01
learning_rate = 1e-2
optimizer = torch.optim.SGD(tudui.parameters(), lr=learning_rate)

# 设置训练网络的一些参数
# 记录训练的次数
total_train_step = 0
# 记录测试的次数
total_test_step = 0
# 训练的轮数
epoch = 10

# 添加tensorboard
writer = SummaryWriter("../logs_train")

for i in range(epoch):
    print("-------第 {} 轮训练开始-------".format(i + 1))

    # 训练步骤开始
    tudui.train()
    for data in train_dataloader:
        imgs, targets = data
        outputs = tudui(imgs)
        loss = loss_fn(outputs, targets)

        # 优化器优化模型
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()

        total_train_step = total_train_step + 1
        if total_train_step % 100 == 0:
            print("训练次数:{}, Loss: {}".format(total_train_step, loss.item()))
            writer.add_scalar("train_loss", loss.item(), total_train_step)

    # 测试步骤开始
    tudui.eval()
    total_test_loss = 0
    total_accuracy = 0
    with torch.no_grad():
        for data in test_dataloader:
            imgs, targets = data
            outputs = tudui(imgs)
            loss = loss_fn(outputs, targets)
            total_test_loss = total_test_loss + loss.item()
            accuracy = (outputs.argmax(1) == targets).sum()  # argmax(1)表示获取横向的最大值,0则竖向
            total_accuracy = total_accuracy + accuracy

    print("整体测试集上的Loss: {}".format(total_test_loss))
    print("整体测试集上的正确率: {}".format(total_accuracy / test_data_size))
    writer.add_scalar("test_loss", total_test_loss, total_test_step)
    writer.add_scalar("test_accuracy", total_accuracy / test_data_size, total_test_step)
    total_test_step = total_test_step + 1

    torch.save(tudui, "tudui_{}.pth".format(i))
    print("模型已保存")

writer.close()

 test.py 

import torch
import torchvision
from PIL import Image
from torch import nn

image_path = "../imgs/airplane.png"  # 这里是看能否预测出这个飞机的图片是不是属于飞机这个类别
image = Image.open(image_path)
print(image)
image = image.convert('RGB')  # 有时候的图片可能是四个通道的,多了透明层,为了保险保留RGB的三个层(通道)即可
transform = torchvision.transforms.Compose([torchvision.transforms.Resize((32, 32)),
                                            torchvision.transforms.ToTensor()])

image = transform(image)
print(image.shape)


class Tudui(nn.Module):
    def __init__(self):
        super(Tudui, self).__init__()
        self.model = nn.Sequential(
            nn.Conv2d(3, 32, 5, 1, 2),
            nn.MaxPool2d(2),
            nn.Conv2d(32, 32, 5, 1, 2),
            nn.MaxPool2d(2),
            nn.Conv2d(32, 64, 5, 1, 2),
            nn.MaxPool2d(2),
            nn.Flatten(),
            nn.Linear(64 * 4 * 4, 64),
            nn.Linear(64, 10)
        )

    def forward(self, x):
        x = self.model(x)
        return x


model = torch.load("tudui_29_gpu.pth", map_location=torch.device('cpu'))  # gpu训练出的模型在cpu运行要加map_location
print(model)
image = torch.reshape(image, (1, 3, 32, 32))
model.eval()
with torch.no_grad():
    output = model(image)
print(output)

print(output.argmax(1))

Guess you like

Origin blog.csdn.net/weixin_61725823/article/details/130360927