Table of contents
LeNet
The classic network structure of lenet was proposed by the big brother LeCun in 1998, and it only has a seven-layer network (it must be sprinkled with wax for today)
Two layers of convolutional layers + two layers of downsampling layers (pooling layers) + three layers of fully connected layers:
In other words, shouldn't this be considered a 5th floor? (It should be counted with parameters, and the pooling layer is not included). I don’t know why it is a 7-layer network. Can anyone tell me the reason?
The following is the structure diagram of the network
The picture is taken from the paper, not owned by me
If you can't see it clearly, here is a high-definition uncoded picture:
The picture is taken from WIKI, not owned by me
pytorch code:
class LeNet(nn.Module):
def __init__(self):
super(LeNet, self).__init__()
layer1 = nn.Sequential()
layer1.add_module('conv1', nn.Conv2d(1, 6, 5, 1, 0)) ##28
layer1.add_module('pool1', nn.MaxPool2d(2,2)) ##14
self.layer1 = layer1
layer2 = nn.Sequential()
layer2.add_module('conv2', nn.Conv2d(6, 16, 5, 1, 0)) ##10
layer2.add_module('pool2', nn.MaxPool2d(2, 2)) ##5
self.layer2 = layer2
layer3 = nn.Sequential()
layer3.add_module('fc1', nn.Linear(400, 120))
layer3.add_module('fc2', nn.Linear(120, 84))
layer3.add_module('fc3', nn.Linear(84, 10))
self.layer3 = layer3
def forward(self, x):
x = self.layer1(x)
x = self.layer2(x)
x = torch.reshape(x, (x.shape[0], -1))
x = self.layer3(x)
return x
AlexNet
The AlexNet network was proposed in 2012. In the ImageNet competition, it won the championship with 10% accuracy ahead of the second place. Compared with LeNet, the number of layers is deeper. At the same time, the activation layer ReLU was introduced for the first time, and the Dropout layer was introduced in the fully connected layer to prevent overfitting.
Detailed illustration:
The pytorch code is as follows:
class AlexNet(nn.Module):
def __init__(self):
super(AlexNet, self).__init__()
self.features = nn.Sequential(
nn.Conv2d(1, 96, 11, 4, 1), #54
nn.ReLU(inplace=True),
nn.MaxPool2d(3, 2), #26
nn.Conv2d(96, 256, 5, 1, 2), # 26
nn.ReLU(inplace=True),
nn.MaxPool2d(3, 2), # 12
nn.Conv2d(256, 384, 3, 1, 1), # 12
nn.ReLU(inplace=True),
nn.Conv2d(384, 384, 3, 1, 1), # 12
nn.ReLU(inplace=True),
nn.Conv2d(384, 256, 3, 1, 1), # 12
nn.ReLU(inplace=True),
nn.MaxPool2d(3, 2) # 5
)
self.classifier = nn.Sequential(
nn.Dropout(),
nn.Linear(256*5*5, 4096),
nn.ReLU(inplace=True),
nn.Dropout(),
nn.Linear(4096, 4096),
nn.ReLU(inplace=True),
nn.Linear(4096, 10) #10 为分类的数量
)
def forward(self, x):
x = self.features(x)
x = torch.reshape(x, (x.shape[0], -1))
x = self.classifier(x)
return x
VggNet
It is the runner-up of the 2014 Image Contest. The number of layers is deeper than AlexNet (16 layers), and the convolution kernel is smaller. The reason why it uses many small filters is that the receptive field of cascading many small filters is the same as that of a large filter. It can also reduce parameters and have a deeper network structure. ,
Network structure diagram:
Detailed diagram:
pytorch code:
class VGG(nn.Module):
def __init__(self):
super(VGG, self).__init__()
self.features = nn.Sequential(
nn.Conv2d(1, 64, 3, 1, 1), #224
nn.ReLU(True),
nn.Conv2d(64, 64, 3, 1, 1), #224
nn.ReLU(True),
nn.MaxPool2d(2,2), ##112
nn.Conv2d(64, 128, 3, 1, 1), #112
nn.ReLU(True),
nn.Conv2d(128, 128, 3, 1, 1), #112
nn.ReLU(True),
nn.MaxPool2d(2, 2), ##56
nn.Conv2d(128, 256, 3, 1, 1), # 56
nn.ReLU(True),
nn.Conv2d(256, 256, 3, 1, 1), # 56
nn.ReLU(True),
nn.Conv2d(256, 256, 3, 1, 1), # 28
nn.ReLU(True),
nn.MaxPool2d(2, 2), ##28
nn.Conv2d(256, 512, 3, 1, 1), # 28
nn.ReLU(True),
nn.Conv2d(512, 512, 3, 1, 1), # 28
nn.ReLU(True),
nn.Conv2d(512, 512, 3, 1, 1), # 28
nn.ReLU(True),
nn.MaxPool2d(2,2), # 14
nn.Conv2d(512, 512, 3, 1, 1), # 14
nn.ReLU(True),
nn.Conv2d(512, 512, 3, 1, 1), # 14
nn.ReLU(True),
nn.Conv2d(512, 512, 3, 1, 1), # 14
nn.ReLU(True),
nn.MaxPool2d(2, 2), # 7
)
self.clifier = nn.Sequential(
nn.Linear(512*7*7, 4096),
nn.ReLU(True),
nn.Dropout(),
nn.Linear(4096, 4096),
nn.ReLU(True),
nn.Dropout(),
nn.Linear(4096, 10)
)
def forward(self, x ):
x = self.features(x)
x = torch.reshape(x, (x.shape[0], -1))
x = self.clifier(x)
return x
GoogLeNet
GoogLeNet has a deeper network structure with 22 layers than the VGG network (in other words, why do you like the depth so much), it was proposed in the 2014 ILSVRC competition, also called lnceptionN, which was carefully prepared by the Google team to participate in the ILSVRC 2014 competition.
GoogLeNet adopts a deeper network structure than VGGNet, with a total of 22 layers, but its parameters are 12 times less than AlexNet. At the same time, it has high computational efficiency, because it uses a very effective Inception module, and it does not have a fully connected layer. Because it removes the latter fully connected layer, the parameters are greatly reduced. At the same time, it has high computational efficiency. It is the champion of the 2014 competition.
The Inception module designs a local network topology, and then stacks these modules together to form an abstract layer network structure. Specifically, several parallel filters are used to convolve and pool the input. These filters have different receptive fields, and finally the output results are stitched together in depth to form the output layer.
The following is an illustration of Inception: 4 1*1 convolutions, 1 3*3 convolution, 1 5*5 convolution, 1 3*3 maximum pooling
Inception code:
class BasicConv2d(nn.Module):
def __init__(self,in_channels, out_channels, kernel, stride=1, padding=0):
super(BasicConv2d, self).__init__()
self.conv = nn.Conv2d(in_channels, out_channels,
kernel_size=kernel, stride=stride, padding=padding,
bias=False)
self.bn = nn.BatchNorm2d(out_channels, eps=0.001)
def forward(self, x):
x = self.conv(x)
x = self.bn(x)
return F.relu(x, inplace=True)
'''
in_channels 输入数据的通道
out_channels_1x1 1*1卷积深度
out_channels_1x1_3 3*3前面的1*1卷积深度
out_channels_3x3 3*3卷积深度
out_channels_1x1_5 5*5前面的1*1卷积深度
out_channels_5x5 5*5卷积深度
out_channels_pool 池化后面的1*1卷积深度
'''
class Inception(nn.Module):
def __init__(self, in_channels, out_channels_1x1,
out_channels_1x1_3, out_channels_3x3,
out_channels_1x1_5, out_channels_5x5,
out_channels_pool ):
super(Inception, self).__init__()
##第一条线
self.branch1x1 = BasicConv2d(in_channels, out_channels_1x1, 1)
##第二条线
self.branch3x3 = nn.Sequential(
BasicConv2d(in_channels, out_channels_1x1_3, 1),
BasicConv2d(out_channels_1x1_3, out_channels_3x3, 3, 1, 1)
)
##第三条线
self.branch5x5 = nn.Sequential(
BasicConv2d(in_channels, out_channels_1x1_5, 1),
BasicConv2d(out_channels_1x1_5, out_channels_5x5, 5, 1, 2)
)
##第四条线
self.branch_pool = nn.Sequential(
nn.MaxPool2d(kernel_size=3, stride=1, padding=1),
BasicConv2d(in_channels, out_channels_pool, 1)
)
def forward(self, x):
branch1x1 = self.branch1x1(x)
branch3x3 = self.branch3x3(x)
branch5x5 = self.branch5x5(x)
branch_pool = self.branch_pool(x)
output = [branch1x1, branch3x3, branch5x5, branch_pool]
return torch.cat(output, 1)
Note: When the size of the convolution kernel is 1, if you want the size of the convolution to remain unchanged, padding should be =1
Detailed explanation:
The reduce on the right side of the table is the added 1*1 convolution kernel
Here are a few good blogs with detailed introductions
The full version of pytorch code:
class BasicConv2d(nn.Module):
def __init__(self,in_channels, out_channels, kernel, stride=1, padding=0):
super(BasicConv2d, self).__init__()
self.conv = nn.Conv2d(in_channels, out_channels,
kernel_size=kernel, stride=stride, padding=padding,
bias=False)
self.bn = nn.BatchNorm2d(out_channels, eps=0.001)
def forward(self, x):
x = self.conv(x)
x = self.bn(x)
return F.relu(x, inplace=True)
'''
in_channels 输入数据的通道
out_channels_1x1 1*1卷积深度
out_channels_1x1_3 3*3前面的1*1卷积深度
out_channels_3x3 3*3卷积深度
out_channels_1x1_5 5*5前面的1*1卷积深度
out_channels_5x5 5*5卷积深度
out_channels_pool 池化后面的1*1卷积深度
'''
class Inception(nn.Module):
def __init__(self, in_channels, out_channels_1x1,
out_channels_1x1_3, out_channels_3x3,
out_channels_1x1_5, out_channels_5x5,
out_channels_pool ):
super(Inception, self).__init__()
##第一条线
self.branch1x1 = BasicConv2d(in_channels, out_channels_1x1, 1)
##第二条线
self.branch3x3 = nn.Sequential(
BasicConv2d(in_channels, out_channels_1x1_3, 1),
BasicConv2d(out_channels_1x1_3, out_channels_3x3, 3, 1, 1)
)
##第三条线
self.branch5x5 = nn.Sequential(
BasicConv2d(in_channels, out_channels_1x1_5, 1),
BasicConv2d(out_channels_1x1_5, out_channels_5x5, 5, 1, 2)
)
##第四条线
self.branch_pool = nn.Sequential(
nn.MaxPool2d(kernel_size=3, stride=1, padding=1),
BasicConv2d(in_channels, out_channels_pool, 1)
)
def forward(self, x):
branch1x1 = self.branch1x1(x)
branch3x3 = self.branch3x3(x)
branch5x5 = self.branch5x5(x)
branch_pool = self.branch_pool(x)
output = [branch1x1, branch3x3, branch5x5, branch_pool]
return torch.cat(output, 1)
class GoogLeNet(nn.Module):
def __init__(self, in_channels, out_channels):
super(GoogLeNet, self).__init__()
##第 1 个模块
self.block1 = nn.Sequential(
nn.Conv2d(in_channels, 64, 7, 2, 3),
nn.MaxPool2d(3, 2, 1)
)
##第 2 个模块
self.block2 = nn.Sequential(
nn.Conv2d(64, 192, 3, 1, 1),
nn.Conv2d(192, 192, 3, 1, 1),
nn.MaxPool2d(3, 2, 1)
)
##第 3 个模块
self.block3 = nn.Sequential(
Inception(192, 64, 96, 128, 16, 32, 32),
Inception(256, 128, 128, 192, 32, 96, 64),
nn.MaxPool2d(3, 2, 1)
)
##第 4 个模块
self.block4 = nn.Sequential(
Inception(480, 192, 96, 208, 16, 48, 64),
Inception(512, 160, 112, 224, 24, 64, 64), #这里究极体会输出
Inception(512, 128, 128, 256, 24, 64, 64),
Inception(512, 112, 144, 288, 32, 64, 64),
Inception(528, 256, 160, 320, 32, 128, 128), #这里究极体会输出
nn.MaxPool2d(3, 2, 1)
)
##第 4 个模块
self.block5 = nn.Sequential(
Inception(832, 256, 160, 320, 32, 128, 128),
Inception(832, 384, 192, 384, 48, 128, 128),
nn.AvgPool2d(7, 1)
)
self.classifier = nn.Sequential(
nn.Dropout(),
nn.Linear(1024, out_channels),
# nn.Sigmoid(1024,out_channels)
)
def forward(self, x):
x = self.block1(x)
x = self.block2(x)
x = self.block3(x)
x = self.block4(x)
x = self.block5(x)
x = torch.reshape(x, (x.shape[0], -1))
x = self.classifier(x)
return x
ResNet
ResNet is the champion of the 2015 ImageNet competition, proposed by Microsoft Research, through the residual module can successfully train up to 152 deep neural networks.
The principle is that the input of a neural network is x, and the expected output is H(x). If the input x is directly passed to the output as the initial result, then the goal to be learned at this time is F(x)=H(x)-x,
The pytorch code of the residual block is as follows:
##如果in_channels == out_channels,则same_shape为TRUE
##如果in_channels != out_channels,则same_shape为FALSE
class BasicBloch(nn.Module):
def __init__(self,in_channels, out_channels, same_shape=True):
super(BasicBloch, self).__init__()
self.same_shape = same_shape
stride = 1 if self.same_shape else 2
self.conv1 = nn.Conv2d(in_channels, out_channels, 3, stride=stride, padding=1)
self.bn1 = nn.BatchNorm2d(out_channels)
self.relu = nn.ReLU(True)
self.conv2 = nn.Conv2d(out_channels, out_channels, 3, 1, 1)
self.bn2 = nn.BatchNorm2d(out_channels)
if not self.same_shape:
self.conv3 = nn.Conv2d(in_channels, out_channels, 1, stride=stride)
def forward(self, x):
out = self.conv1(x)
out = self.bn1(out)
out = self.relu(out)
out = self.conv2(out)
out = self.bn2(out)
if not self.same_shape:
x = self.conv3(x)
out += x
out = self.relu(out)
return out
Network complete code
class ResNet(nn.Module):
def __init__(self, in_channel):
super(ResNet, self).__init__()
self.verbose = None
self.block1 = nn.Conv2d(in_channel, 64, 7, 2)
self.block2 = nn.Sequential(
nn.MaxPool2d(3, 2),
BasicBloch(64, 64),
BasicBloch(64, 64)
)
self.block3 = nn.Sequential(
BasicBloch(64, 128, False),
BasicBloch(128, 128)
)
self.block4 = nn.Sequential(
BasicBloch(128, 256, False),
BasicBloch(256, 256)
)
self.block5 = nn.Sequential(
BasicBloch(256, 512, False),
BasicBloch(512, 512),
nn.AvgPool2d(3)
)
self.classifier = nn.Linear(2048, 10)
def forward(self, x):
x = self.block1(x)
x = self.block2(x)
x = self.block3(x)
x = self.block4(x)
x = self.block5(x)
x = x.view(x.shape[0], -1)
x = self.classifier(x)
return x
Dreams
The unet network is actually a transformation of FCN (full convolutional network), which is a pixel-segmented network.
The deep convolutional network is good at dealing with classification problems. The features obtained by convolution of the picture, that is to say, the output is less (the probability of classification) from more input (picture pixels), and then classified. However, in the case of segmentation, the input is the original image size, and the output is also the original size. So in this regard, if we transform our thinking and classify the pixels, we can get the output segmented image. So in the end, semantic segmentation is still a classification problem.
The following is the network structure of Unet
The code for the network structure:
import torch
import torch.nn as nn
from torch.nn import functional as F
class conv(nn.Module):
def __init__(self, c_in, c_out):
super(conv, self).__init__()
self.layer = nn.Sequential(
nn.Conv2d(c_in, c_out, 3, 1, 1),
nn.BatchNorm2d(c_out),
# 防止过拟合
nn.Dropout(0.3),
nn.LeakyReLU(),
nn.Conv2d(c_out, c_out, 3, 1, 1),
nn.BatchNorm2d(c_out),
# 防止过拟合
nn.Dropout(0.3),
nn.LeakyReLU(),
)
def forward(self, x):
return self.layer(x)
## 下采样
class DownSampling(nn.Module):
def __init__(self, channel):
super(DownSampling, self).__init__()
self.Done = nn.Sequential(
nn.Conv2d(channel, channel, 3, 2, 1)
)
def forward(self, x):
return self.Done(x)
class UpSampling(nn.Module):
def __init__(self, channel):
super(UpSampling, self).__init__()
self.up = nn.Conv2d(channel, channel//2, 1, 1)
self.conv_tf = nn.ConvTranspose2d(channel//2, channel//2, 4, 2, 1)
def forward(self, x, r):
x = self.up(x)
# x = F.interpolate(x, scale_factor=2, mode="nearest")
x = self.conv_tf(x)
return torch.cat((x, r), 1) ## 只是通道相加
class Unet(nn.Module):
def __init__(self):
super(Unet, self).__init__()
# self.c1 = nn.Conv2d(3, 64, 3, 1, 1)
self.layer1 = conv(3, 64)
self.layer2 = nn.Sequential(
DownSampling(64),
conv(64, 128),
)
self.layer3 = nn.Sequential(
DownSampling(128),
conv(128, 256),
)
self.layer4 = nn.Sequential(
DownSampling(256),
conv(256, 512),
)
self.layer5 = nn.Sequential(
DownSampling(512),
conv(512, 1024),
)
self.layer6 = nn.Sequential(
DownSampling(1024),
conv(1024, 2048),
)
self.layer_up_1 = UpSampling(2048)
self.c1 = conv(2048, 1024)
self.layer_up_2 = UpSampling(1024)
self.c2 = conv(1024, 512)
self.layer_up_3 = UpSampling(512)
self.c3 = conv(512, 256)
self.layer_up_4 = UpSampling(256)
self.c4 = conv(256, 128)
self.layer_up_5 = UpSampling(128)
self.c5 = conv(128, 64)
self.layer_up_6 = nn.Sequential(
nn.Conv2d(64, 3, 3, 1, 1),
nn.Sigmoid()
)
Dreams++
This is the structure of Unet++
class VGGBlock(nn.Module):
def __init__(self, in_channels, middle_channels, out_channels):
super().__init__()
self.relu = nn.ReLU(inplace=True)
self.conv1 = nn.Conv2d(in_channels, middle_channels, 3, padding=1)
self.bn1 = nn.BatchNorm2d(middle_channels)
self.conv2 = nn.Conv2d(middle_channels, out_channels, 3, padding=1)
self.bn2 = nn.BatchNorm2d(out_channels)
def forward(self, x):
out = self.conv1(x)
out = self.bn1(out)
out = self.relu(out)
out = self.conv2(out)
out = self.bn2(out)
out = self.relu(out)
return out
class NestedUNet(nn.Module):
def __init__(self, num_classes, input_channels=3, deep_supervision=False, **kwargs):
super().__init__()
nb_filter = [32, 64, 128, 256, 512]
self.deep_supervision = deep_supervision
self.pool = nn.MaxPool2d(2, 2)
self.up = nn.Upsample(scale_factor=2, mode='bilinear', align_corners=True)
self.conv0_0 = VGGBlock(input_channels, nb_filter[0], nb_filter[0])
self.conv1_0 = VGGBlock(nb_filter[0], nb_filter[1], nb_filter[1])
self.conv2_0 = VGGBlock(nb_filter[1], nb_filter[2], nb_filter[2])
self.conv3_0 = VGGBlock(nb_filter[2], nb_filter[3], nb_filter[3])
self.conv4_0 = VGGBlock(nb_filter[3], nb_filter[4], nb_filter[4])
self.conv0_1 = VGGBlock(nb_filter[0]+nb_filter[1], nb_filter[0], nb_filter[0])
self.conv1_1 = VGGBlock(nb_filter[1]+nb_filter[2], nb_filter[1], nb_filter[1])
self.conv2_1 = VGGBlock(nb_filter[2]+nb_filter[3], nb_filter[2], nb_filter[2])
self.conv3_1 = VGGBlock(nb_filter[3]+nb_filter[4], nb_filter[3], nb_filter[3])
self.conv0_2 = VGGBlock(nb_filter[0]*2+nb_filter[1], nb_filter[0], nb_filter[0])
self.conv1_2 = VGGBlock(nb_filter[1]*2+nb_filter[2], nb_filter[1], nb_filter[1])
self.conv2_2 = VGGBlock(nb_filter[2]*2+nb_filter[3], nb_filter[2], nb_filter[2])
self.conv0_3 = VGGBlock(nb_filter[0]*3+nb_filter[1], nb_filter[0], nb_filter[0])
self.conv1_3 = VGGBlock(nb_filter[1]*3+nb_filter[2], nb_filter[1], nb_filter[1])
self.conv0_4 = VGGBlock(nb_filter[0]*4+nb_filter[1], nb_filter[0], nb_filter[0])
if self.deep_supervision:
self.final1 = nn.Conv2d(nb_filter[0], num_classes, kernel_size=1)
self.final2 = nn.Conv2d(nb_filter[0], num_classes, kernel_size=1)
self.final3 = nn.Conv2d(nb_filter[0], num_classes, kernel_size=1)
self.final4 = nn.Conv2d(nb_filter[0], num_classes, kernel_size=1)
else:
self.final = nn.Conv2d(nb_filter[0], num_classes, kernel_size=1)
def forward(self, input):
print('input:',input.shape)
x0_0 = self.conv0_0(input)
print('x0_0:',x0_0.shape)
x1_0 = self.conv1_0(self.pool(x0_0))
print('x1_0:',x1_0.shape)
x0_1 = self.conv0_1(torch.cat([x0_0, self.up(x1_0)], 1))
print('x0_1:',x0_1.shape)
x2_0 = self.conv2_0(self.pool(x1_0))
print('x2_0:',x2_0.shape)
x1_1 = self.conv1_1(torch.cat([x1_0, self.up(x2_0)], 1))
print('x1_1:',x1_1.shape)
x0_2 = self.conv0_2(torch.cat([x0_0, x0_1, self.up(x1_1)], 1))
print('x0_2:',x0_2.shape)
x3_0 = self.conv3_0(self.pool(x2_0))
print('x3_0:',x3_0.shape)
x2_1 = self.conv2_1(torch.cat([x2_0, self.up(x3_0)], 1))
print('x2_1:',x2_1.shape)
x1_2 = self.conv1_2(torch.cat([x1_0, x1_1, self.up(x2_1)], 1))
print('x1_2:',x1_2.shape)
x0_3 = self.conv0_3(torch.cat([x0_0, x0_1, x0_2, self.up(x1_2)], 1))
print('x0_3:',x0_3.shape)
x4_0 = self.conv4_0(self.pool(x3_0))
print('x4_0:',x4_0.shape)
x3_1 = self.conv3_1(torch.cat([x3_0, self.up(x4_0)], 1))
print('x3_1:',x3_1.shape)
x2_2 = self.conv2_2(torch.cat([x2_0, x2_1, self.up(x3_1)], 1))
print('x2_2:',x2_2.shape)
x1_3 = self.conv1_3(torch.cat([x1_0, x1_1, x1_2, self.up(x2_2)], 1))
print('x1_3:',x1_3.shape)
x0_4 = self.conv0_4(torch.cat([x0_0, x0_1, x0_2, x0_3, self.up(x1_3)], 1))
print('x0_4:',x0_4.shape)
if self.deep_supervision:
output1 = self.final1(x0_1)
output2 = self.final2(x0_2)
output3 = self.final3(x0_3)
output4 = self.final4(x0_4)
return [output1, output2, output3, output4]
else:
output = self.final(x0_4)
return output
In the end, the wife
If you like it, give my wife a thumbs up ☺