Author: chen_h
Micro Signal & QQ: 862251340
micro-channel public number: coderpai
(D) pytorch study notes
What is the convolution neural network CNN (Convolutional Neural Network)
Convolutional neural network is an artificial neural network gradual rise in recent years, since the use of a convolutional neural network can give better results in the predicted image and speech recognition, a technique which is also applicable widely spread. aspects convolution neural network is most commonly applied image recognition computer, but because of constant innovation, it is also used in video analytics, natural language processing, drug discovery, and so on. the most recent fire Alpha Go, let the computer look Go understand, there is also applied to this technology.
Convolution and neural networks
Let's talk about how specific neural network is a convolution operation of it, give a picture identification example, we know that the neural network is composed of a series of nerve layers, each layer of the nerve layer there exist a lot of neurons. These nerve yuan is the key to identifying the neural network of things. each will have a neural network input and output values, when the input value is the picture when, in fact, is not that the neural network input colorful pattern, but piles of numbers. to say this. when the neural network to deal with so much information input of time, that is a convolution neural network can play its advantages of time. What is the convolution neural network it?
We first convolution neural network word break it. "Convolution" and "neural networks." Convolution neural network that is no longer done processing the input information for each pixel, but on each picture a small pixel area for processing, this approach to strengthen the continuity of the picture information. so that the neural network can see the graphic, rather than a point. this approach also deepened the understanding of the neural network image. specifically, convolution neural network has a batch filter images in continuous scroll of information collected on the picture, every time collected just collect a small pixel area, and then to collate information gathered, sorted out this time with some of the information is actually presented, such as neural networks at this time to see some of the edge of the picture information, and then in the same step, with a similar batch filter sweeps resulting from these edge information from these neural networks edge information inside a higher-level summary information structure, such as summary edge can draw the eyes, nose, etc. and then through a filter, also from the face information Information nose eyes are summed up. The last information we then set into the general picture of several layers fully connected neural layer classification, so that we can get input can be divided into what type of result.
We introduce the interception of a google video convolution neural networks, specifically talk about how the picture is convoluted. Below is a picture of a cat, pictures have long, wide and high three parameters for! Picture is highly! Here high refers to the computer information used to generate the color used. If it is black and white photos, high unit only 1, if a color photo, you may have information red, green and blue colors, when the height is 3 after we photograph in color as an example. filter image is constantly moving things, he continued to collect a small group small group of pixel blocks in the picture, after collecting all the information, the value of output, we can be understood as a highly higher , length and width smaller "picture." this image was able to contain some of the edge information, and then the same process was repeated several times convolution, length and width of the picture and then compress, add height, there is an input pictures deeper understanding. compression, increased information on the general classification nest in the nerve layer, we will be able to classify such a picture.
Pooling (pooling)
The study found that at the time of each convolution, the nerve layer may inadvertently lose some information. At this time, the pool of (pooling) can be a good solution to this problem. And pooling is a process of screening filter, can the layer useful information filtering out to the next layer analysis but also reduce the computational burden neural network. That is when the volume set, we do not compress the length and width, as much as possible to retain more information, compressed work on to pooling, so an additional work can be very effective to improve the accuracy. With these technologies we can build our own convolution of a neural network friends.
Popular CNN structure
More popular building structure is such, in order from the lower to the upper, first input image (Image), after one convolution information layer (Convolution), and then treated in the manner of convolution pooled (Pooling) of used here is the max pooling manner. then after a similar process, the nerve layer (fully connected) to obtain information of the second processing incoming fully connected layers, which is generally the two-layer neural network layer and finally connect in a classifier (classifier) to classify forecast. this is just the convolution neural network on a simple picture processing introduction.
CNN convolution neural network
Convolution neural networks are now widely used in picture identification, there are already applications emerging, and then we will make a step by step analysis of handwritten digits CNN it.
Here is a CNN last layer of the learning process, let's take a look at the visualization:
MNIST handwritten data
import torch
import torch.nn as nn
import torch.utils.data as Data
import torchvision # 数据库模块
import matplotlib.pyplot as plt
torch.manual_seed(1) # reproducible
# Hyper Parameters
EPOCH = 1 # 训练整批数据多少次, 为了节约时间, 我们只训练一次
BATCH_SIZE = 50
LR = 0.001 # 学习率
DOWNLOAD_MNIST = True # 如果你已经下载好了mnist数据就写上 False
# Mnist 手写数字
train_data = torchvision.datasets.MNIST(
root='./mnist/', # 保存或者提取位置
train=True, # this is training data
transform=torchvision.transforms.ToTensor(), # 转换 PIL.Image or numpy.ndarray 成
# torch.FloatTensor (C x H x W), 训练的时候 normalize 成 [0.0, 1.0] 区间
download=DOWNLOAD_MNIST, # 没下载就下载, 下载了就不用再下了
)
The value of the local black is 0 and white where the value is greater than zero.
Also, apart from the training data, and gave some test data, test to see if it does not have good training.
test_data = torchvision.datasets.MNIST(root='./mnist/', train=False)
# 批训练 50samples, 1 channel, 28x28 (50, 1, 28, 28)
train_loader = Data.DataLoader(dataset=train_data, batch_size=BATCH_SIZE, shuffle=True)
# 为了节约时间, 我们测试时只测试前2000个
test_x = torch.unsqueeze(test_data.test_data, dim=1).type(torch.FloatTensor)[:2000]/255. # shape from (2000, 28, 28) to (2000, 1, 28, 28), value in range(0,1)
test_y = test_data.test_labels[:2000]
CNN model
As before, we use a class to build this model CNN CNN the whole process is the convolution (. Conv2d
) -> activation function ( ReLU
) -> pooled, downsampling ( MaxPooling
) -> do it again -> flattened multi-dimensional convolution FIG feature into -> full access connection layer ( Linear
) -> output
class CNN(nn.Module):
def __init__(self):
super(CNN, self).__init__()
self.conv1 = nn.Sequential( # input shape (1, 28, 28)
nn.Conv2d(
in_channels=1, # input height
out_channels=16, # n_filters
kernel_size=5, # filter size
stride=1, # filter movement/step
padding=2, # 如果想要 con2d 出来的图片长宽没有变化, padding=(kernel_size-1)/2 当 stride=1
), # output shape (16, 28, 28)
nn.ReLU(), # activation
nn.MaxPool2d(kernel_size=2), # 在 2x2 空间里向下采样, output shape (16, 14, 14)
)
self.conv2 = nn.Sequential( # input shape (16, 14, 14)
nn.Conv2d(16, 32, 5, 1, 2), # output shape (32, 14, 14)
nn.ReLU(), # activation
nn.MaxPool2d(2), # output shape (32, 7, 7)
)
self.out = nn.Linear(32 * 7 * 7, 10) # fully connected layer, output 10 classes
def forward(self, x):
x = self.conv1(x)
x = self.conv2(x)
x = x.view(x.size(0), -1) # 展平多维的卷积图成 (batch_size, 32 * 7 * 7)
output = self.out(x)
return output
cnn = CNN()
print(cnn) # net architecture
"""
CNN (
(conv1): Sequential (
(0): Conv2d(1, 16, kernel_size=(5, 5), stride=(1, 1), padding=(2, 2))
(1): ReLU ()
(2): MaxPool2d (size=(2, 2), stride=(2, 2), dilation=(1, 1))
)
(conv2): Sequential (
(0): Conv2d(16, 32, kernel_size=(5, 5), stride=(1, 1), padding=(2, 2))
(1): ReLU ()
(2): MaxPool2d (size=(2, 2), stride=(2, 2), dilation=(1, 1))
)
(out): Linear (1568 -> 10)
)
"""
training
Here we begin training will x
y
have a Variable
wrap, then placed cnn
calculated output
, and finally a calculation error. The following codes are omitted accuracy of the calculation accuracy
portion
optimizer = torch.optim.Adam(cnn.parameters(), lr=LR) # optimize all cnn parameters
loss_func = nn.CrossEntropyLoss() # the target label is not one-hotted
# training and testing
for epoch in range(EPOCH):
for step, (b_x, b_y) in enumerate(train_loader): # 分配 batch data, normalize x when iterate train_loader
output = cnn(b_x) # cnn output
loss = loss_func(output, b_y) # cross entropy loss
optimizer.zero_grad() # clear gradients for this training step
loss.backward() # backpropagation, compute gradients
optimizer.step() # apply gradients
"""
...
Epoch: 0 | train loss: 0.0306 | test accuracy: 0.97
Epoch: 0 | train loss: 0.0147 | test accuracy: 0.98
Epoch: 0 | train loss: 0.0427 | test accuracy: 0.98
Epoch: 0 | train loss: 0.0078 | test accuracy: 0.98
"""
Finally, we come to take 10 data to see predicted value right in the end:
test_output = cnn(test_x[:10])
pred_y = torch.max(test_output, 1)[1].data.numpy().squeeze()
print(pred_y, 'prediction number')
print(test_y[:10].numpy(), 'real number')
"""
[7 2 1 0 4 1 4 9 5 9] prediction number
[7 2 1 0 4 1 4 9 5 9] real number
"""
Visualization Training
This is what is done after a sudden want to add video, because visualization can help to understand, so it is necessary to mention Visualization code is mainly used matplotlib
and sklearn
done, because we used T-SNE
dimensionality reduction means, the high-dimensional the last layer CNN visual output, i.e. CNN Forward code x = x.view(x.size(0), -1)
this result.
Visualization of the code is not the point, we direct visualization of the results show it.
link:
https://morvanzhou.github.io/tutorials/machine-learning/torch/4-01-CNN/
https://github.com/MorvanZhou/PyTorch-Tutorial/blob/master/tutorial-contents/401_CNN.py