Use PyTorch simple implementation convolution neural network model

  Here we will achieve three simple convolution neural network model with Python: LeNet  , AlexNet  , VGGNet , first of all we need to know three basic data set: MNIST data set, Cifar data sets and data sets ImageNet

Three basic data set

MNIST data set

  MNIST data set is used as a data set of handwriting recognition. MNIST training data set contains 60,000 pictures, 10,000 test pictures. Where each picture is a numbers 0 to 9. Image size is 28 × 28. Since the data of the data set is relatively simple, manual annotation error rate was 0.2%.

Cifar data set

  Cifar image classification data set is a data set. Divided Cifar-10 Cifar-100 and the two data sets. Cifar dataset picture for the 32 × 32 color pictures, and these pictures by Professor Alex Krizhenevsky, Dr. Vinod Nair and Professor Geoffrey Hilton finishing. Cifar-10 data sets collected 60,000 images from 10 different species, these species are: airplanes, cars, birds, cats, deer, dogs, frogs, horses, boats and trucks. Cifar-10 in the data set, manual annotation accuracy was 94%.

ImageNet data set

  ImageNet data set is a large image data sets by Stanford University professor Li Feifei take the lead from finishing. In ImageNet, nearly 15 million related to the picture in WordNet 20000 ranking synonym sets. ImageNet held annually in computer vision related to race, ImageNet data set covering all research in computer vision, which is used for image classification data set is ILSVRC2012 image classification data set.

  Data ILSVRC2012 dataset Cifar-10 and consistent data set, the main object identification images, which contains 1.2 million images from 1000 type, each image belong to only one species, the size from a few thousand to several million bytes byte range. Convolution neural network is also in this dataset a famous battle.

 

Three classical convolution neural network model

  Three classical convolution neural network model: LeNet , AlexNet , VGGNet . These three structures convolution neural network is not particularly complex, the following were achieved using PyTorch

LeNet model

  LeNet specifically refers LeNet-5. LeNet-5 model is a professor Yann LeCun in 1998 applied to document recognition set forth in the papers Gradient-based learning, it is the first successfully applied to the problem of digital identification convolution neural network. On MNIST dataset, LeNet-5 model can achieve accuracy of about 99.2%. LeNet-5 model a total of seven layers including two layers convolution, two cell layers, two fully connected layers and an output layer, the lower diagram shows the architecture of LeNet-5 model.

 Use PyTorch achieve

class LeNet(nn.Module):
    def __init__(self):
        super(LeNet, self).__init__()
        self.conv1 = nn.Conv2d(3, 6, 5)
        self.conv2 = nn.Conv2d(6, 16, 5)
        self.fc1   = nn.Linear(16*5*5, 120)
        self.fc2   = nn.Linear(120, 84)
        self.fc3   = nn.Linear(84, 10)
    def forward(self, x):
        out = F.relu(self.conv1(x))
        out = F.max_pool2d(out, 2)
        out = F.relu(self.conv2(out))
        out = F.max_pool2d(out, 2)
        out = out.view(out.size(0), -1)
        out = F.relu(self.fc1(out))
        out = F.relu(self.fc2(out))
        out = self.fc3(out)
        return out
PyTorch 实现 LeNet 模型

 

AlexNet模型

  Alex Krizhevsky 提出了卷积神经网络模型 AlexNet。AlexNet 在卷积神经网络上成功地应用了 Relu,Dropout 和 LRN 等技巧。在 ImageNet 竞赛上,AlexNet 以领先第二名 10% 的准确率而夺得冠军。成功地展示了深度学习的威力。它的网络结构如下:

  由于当时GPU计算能力不强,AlexNet使用了两个GPU并行计算,现在可以用一个GPU替换。以单个GPU的AlexNet模型为例,包括有:5个卷积层,3个池化层,3个全连接层。其中卷积层和全连接层包括有relu层,在全连接层中还有dropout层。具体参数的配置可以看下图

 使用 PyTorch 实现

class AlexNet(nn.Module):
    def __init__(self, num_classes):
        super(AlexNet, self).__init__()
        self.features = nn.Sequential(
            nn.Conv2d(3, 96, kernel_size=11, stride=4, padding=2),
            nn.ReLU(inplace=True),
            nn.MaxPool2d(kernel_size=3, stride=2),
            nn.Conv2d(64, 256, kernel_size=5, padding=2),
            nn.ReLU(inplace=True),
            nn.MaxPool2d(kernel_size=3, stride=2),
            nn.Conv2d(192, 384, kernel_size=3, padding=1),
            nn.ReLU(inplace=True),
            nn.Conv2d(384, 256, kernel_size=3, padding=1),
            nn.ReLU(inplace=True),
            nn.Conv2d(256, 256, kernel_size=3, padding=1),
            nn.ReLU(inplace=True),
            nn.MaxPool2d(kernel_size=3, stride=2),
        )
        self.classifier = nn.Sequential(
            nn.Dropout(),
            nn.Linear(256 * 6 * 6, 4096),
            nn.ReLU(inplace=True),
            nn.Dropout(),
            nn.Linear(4096, 4096),
            nn.ReLU(inplace=True),
            nn.Linear(4096, num_classes),
        )
    def forward(self, x):
        x = self.features(x)
        x = x.view(x.size(0), 256 * 6 * 6)
        x = self.classifier(x)
        return x
PyTorch 实现 AlexNet 模型

 

VGGNet模型

  VGGNet是牛津大学计算机视觉组和Google DeepMind公司的研究人员一起研发的一种卷积神经网络。通过堆叠3×3 的小型卷积核和2×2的最大池化层,VGGNet成功地构筑了最深达19层的卷积神经网络。由于VGGNet的拓展性强,迁移到其他图片数据上的泛化性比较好,可用作迁移学习。

下图为 VGG16 的整体架构图

  从左至右,一张彩色图片输入到网络,白色框是卷积层,红色是池化,蓝色是全连接层,棕色框是预测层。预测层的作用是将全连接层输出的信息转化为相应的类别概率,而起到分类作用。

可以看到 VGG16 是13个卷积层+3个全连接层叠加而成。

使用 PyTorch 实现

cfg = {
    'VGG11': [64, 'M', 128, 'M', 256, 256, 'M', 512, 512, 'M', 512, 512, 'M'],
    'VGG13': [64, 64, 'M', 128, 128, 'M', 256, 256, 'M', 512, 512, 'M', 512, 512, 'M'],
    'VGG16': [64, 64, 'M', 128, 128, 'M', 256, 256, 256, 'M', 512, 512, 512, 'M', 512, 512, 512, 'M'],
    'VGG19': [64, 64, 'M', 128, 128, 'M', 256, 256, 256, 256, 'M', 512, 512, 512, 512, 'M', 512, 512, 512, 512, 'M'],
}
class VGG(nn.Module):
    def __init__(self, vgg_name):
        super(VGG, self).__init__()
        self.features = self._make_layers(cfg[vgg_name])
        self.classifier = nn.Linear(512, 10)
    def forward(self, x):
        out = self.features(x)
        out = out.view(out.size(0), -1)
        out = self.classifier(out)
        return out
    def _make_layers(self, cfg):
        layers = []
        in_channels = 3
        for x in cfg:
            if x == 'M':
                layers += [nn.MaxPool2d(kernel_size=2, stride=2)]
            else:
                layers += [nn.Conv2d(in_channels, x, kernel_size=3, padding=1),
                           nn.BatchNorm2d(x),
                           nn.ReLU(inplace=True)]
                in_channels = x
        layers += [nn.AvgPool2d(kernel_size=1, stride=1)]
        return nn.Sequential(*layers)
PyTorch 实现 VGGNet 模型

 

  VGG16 是基于大量真实图像的 ImageNet 图像库预训练的网络

  VGG16 对应的供 keras 使用的模型人家已经帮我们训练好,我们将学习好的 VGG16 的权重迁移(transfer)到自己的卷积神经网络上作为网络的初始权重,这样我们自己的网络不用从头开始从大量的数据里面训练,从而提高训练速度。这里的迁移就是平时所说的迁移学习。

from keras.models import Sequential
from keras.layers.core import Flatten, Dense, Dropout
from keras.layers.convolutional import Convolution2D, MaxPooling2D, ZeroPadding2D
from keras.optimizers import SGD
import cv2, numpy as np


def VGG_16(weights_path=None):
    model = Sequential()
    model.add(ZeroPadding2D((1, 1), input_shape=(3, 244, 244))) # 卷积输入层指定输入图像大小(网络开始输入(3,224,224)的图像数据,即一张宽224,高244的彩色RGB图片)
    model.add(Convolution2D(64, 3, 3, activation='relu')) # 64个3*3的卷积核,生成64*244*244的图像
    model.add(ZeroPadding2D(1, 1)) # 补0,(1, 1)表示横向和纵向都补0,保证卷积后图像大小不变,可以用padding='valid'参数代替
    model.add(Convolution2D(64, 3, 3, activation='relu')) # 再次卷积操作,生成64*244*244的图像,激活函数是relu
    model.add(MaxPooling2D((2, 2), strides=(2, 2))) # pooling操作,相当于变成64*112*112,小矩阵是(2,2),步长(2,2),指的是横向每次移动2格,纵向每次移动2格。

    # 再往下,同理,只不过是卷积核个数依次变成128,256,512,而每次按照这样池化之后,矩阵都要缩小一半。
    model.add(ZeroPadding2D((1, 1)))
    model.add(Convolution2D(128, 3, 3, activation='relu'))
    model.add(ZeroPadding2D((1, 1)))
    model.add(Convolution2D(128, 3, 3, activation='relu'))
    model.add(MaxPooling2D((2, 2), strides=(2, 2))) # 128*56*56

    model.add(ZeroPadding2D((1, 1)))
    model.add(Convolution2D(256, 3, 3, activation='relu'))
    model.add(ZeroPadding2D((1, 1)))
    model.add(Convolution2D(256, 3, 3, activation='relu'))
    model.add(ZeroPadding2D((1, 1)))
    model.add(Convolution2D(256, 3, 3, activation='relu'))
    model.add(MaxPooling2D((2, 2), strides=(2, 2)))  # 256*28*28

    model.add(ZeroPadding2D((1, 1)))
    model.add(Convolution2D(512, 3, 3, activation='relu'))
    model.add(ZeroPadding2D((1, 1)))
    model.add(Convolution2D(512, 3, 3, activation='relu'))
    model.add(ZeroPadding2D((1, 1)))
    model.add(Convolution2D(512, 3, 3, activation='relu'))
    model.add(MaxPooling2D((2, 2), strides=(2, 2)))  # 512*14*14

    model.add(ZeroPadding2D((1, 1)))
    model.add(Convolution2D(512, 3, 3, activation='relu'))
    model.add(ZeroPadding2D((1, 1)))
    model.add(Convolution2D(512, 3, 3, activation='relu'))
    model.add(ZeroPadding2D((1, 1)))
    model.add(Convolution2D(512, 3, 3, activation='relu'))
    model.add(MaxPooling2D((2, 2), strides=(2, 2)))  # 128*7*7
    # 13层卷积和池化之后,数据变成了512 * 7 * 7

    model.add(Flatten) # 压平上述向量,变成一维512*7*7=25088
    # 三个全连接层
    '''
    4096只是个经验值,其他数当然可以,试试效果,只要不要小于要预测的类别数,这里要预测的类别有1000种,
    所以最后预测的全连接有1000个神经元。如果你想用VGG16 给自己的数据作分类任务,这里就需要改成你预测的类别数。
    '''
    model.add(Dense(4096, activation='relu')) # 全连接层有4096个神经元,参数个数是4096*25088
    model.add(Dropout(0.5)) # 0.5的概率抛弃一些连接
    model.add(Dense(4096, activation='relu'))  # 全连接层有4096个神经元,参数个数是4096*25088
    model.add(Dropout(0.5))  # 0.5的概率抛弃一些连接
    model.add(Dense(1000, activation='softmax'))

    if weights_path:
        model.load_weights(weights_path)
    return model

# 如果要设计其他类型的CNN网络,就是将这些基本单元比如卷积层个数、全连接个数按自己需要搭配变换,
使用 Keras 实现 VGG16
# 模型需要事先下载
model = VGG_16('vgg16_weights.h5')

sgd = SGD(lr=0.1, decay=1e-6, momentum=0.9, nesterov=True)
model.compile(optimizer=sgd, loss='categorical_crossentropy')

# 加载图片
def load_image(img):
    im = cv2.resize(cv2.imread(img), (224, 224)).astype(np.float32)
    im[:,:,0] -= 103.939
    im[:,:,1] -= 116.779
    im[:,:,2] -= 123.68
    im = im.transpose((2, 0, 1))
    im = np.expand_dims(im, axis=0)
    return im

# 读取vgg16类别的文件
f = open('synset_words.txt', 'r')
lines = f.readline()
f.close()

def predict(url):
    im = load_image(url)
    pre = np.argmax(model.predict(im))
    print(lines[pre])
    
# 测试图片
predict('xx.jpg')
测试 VGG16 模型

 

                      

 

Guess you like

Origin www.cnblogs.com/zhuminghui/p/11533539.html