使用卷积神经网络LeNet5完成车牌识别之网络参数配置详解(Python+PaddlePaddle)

LeNet5里包含了卷积层,池化层以及全连接层,相比DNN只有全连接层来说,稍微复杂了一些,但是效果却要比DNN好。下面我们来分析一下。

DNN网络:

  1. 网络结构不够灵活
    图片的大小决定了输入层的节点个数,假如图片大小是1616,那么输入层有256个节点,但是如今的图片越来越大,换成100100大小的图片来做相同的任务,只有通过增加每层的神经元个数或者增加网络的层数来完成。

  2. 网络中参数太多
    还是刚刚的例子,输入一个大小为1616的图片,输入层为256个节点,隐藏层每层1000个节点,输出层10个。假设共5层,则共需要学习:(256103+106+106+104 )个w再加( 1000+ 1000+ 1000+10 )个b
    换成更大的图片时,这个参数是非常庞大的

因此,为了解决以上问题,卷积神经网络就诞生了:

( 图片源自:paddlepaddle)
其中,LeNet5是最早的卷积神经网络之一,想要了解他,我们先来看看这三个层的作用:
  1. 卷积层:
    在这里插入图片描述
    左边是一张66大小的单通道图片,它跟一个33的过滤器filter卷积,其过程是(默认步长为1):
    在这里插入图片描述
    因为步长为1,所以下一步是这样的:
    在这里插入图片描述
    最终,一张6 * 6的图片会变成一张4 * 4的输出

如果卷积核的个数为3,那么你将获得三个特征的输出:
在这里插入图片描述
换句话说,一个卷积核可以提取图像的一种特征,多个卷积核可以提取多种特征

  1. 池化层:
    我们知道,一张图片,不管放大还是缩小,都不影响我们判断这张图表达的含义:
    在这里插入图片描述
    在这里插入图片描述
    就像这两张图,虽然第一张图比第二张图小了很多,但是我们都认得出这两张图是苹果

因此,Pooling层的作用就是缩减输出的尺寸,这一步也叫做下采样

池化层大小为2*2,步长为2时:
在这里插入图片描述
池化有两种方法,一种是取最大值,另一种是取平均值

3.全连接层:
最后通过全连接层,输出结果,这里需要注意的是:

  • input_dim要等于矩阵的列数
  • output_dim等于标签数

下面一起来看一下代码:

#定义LeNet5网络
class MyLeNet(fluid.dygraph.Layer):
    def __init__(self):
        super(MyLeNet,self).__init__()
        self.hidden1_1 = Conv2D(num_channels=1, num_filters=28, filter_size=5, act='relu')
        self.hidden1_2 = Pool2D(pool_size=2, pool_stride=1, pool_type='max')
        self.hidden2_1 = Conv2D(num_channels=28, num_filters=32, filter_size=3, act='relu')
        self.hidden2_2 = Pool2D(pool_size=2, pool_stride=1, pool_type='max')
        self.hidden3 = Conv2D(num_channels=32, num_filters=32, filter_size=3, act='relu')
        self.hidden4 = Linear(input_dim=32*10*10, output_dim=65, act='softmax')

    def forward(self,input):
        # print(input.shape)
        x = self.hidden1_1(input)
        # print(x.shape)
        x = self.hidden1_2(x)
        # print(x.shape)
        x = self.hidden2_1(x)
        # print(x.shape)
        x = self.hidden2_2(x)
        # print(x.shape)
        x = self.hidden3(x)
        # print(x.shape)
        x = fluid.layers.reshape(x, shape=[-1, 32*10*10])
        # print(x.shape)
        y = self.hidden4(x)
        # print(y.shape)

        return y

把print输出的注释取消可以看到图片的shape:
在这里插入图片描述

这里详细解释一下:

  1. 输入的图片是[128,1,20,20],128是batch_size,即一个批量放128张图片,1指的是单通道,即灰白图片,两个28是图片的大小是20*20
  2. 进入了卷积层(num_channels=1, num_filters=28, filter_size=5)后,变成了[128,28,16,16],128图片是不会少的,所以128在接下来的层中是一直不变的,num_filters是卷积核的个数,有多少个卷积核就提取多少个特征,因此这里的28指的是特征数,卷积核的大小是5*5,步长默认为1,因此,输出了20-5+1=16,不会算的话,画出来就知道了,另外,由于这张图是单通道的,因此num_channels=1
  3. 下面是池化层(pool_size=2, pool_stride=1, pool_type=‘max’),从池化层输出的shape是[128,28,15,15],池化层不会改变原有特征,因此128和28都不变,池化层大小pool_size为2,步长为1,通过计算16-2+1=15,因此图片变成了15*15
  4. 经过三个卷积层,两个池化层后,不难算出此时图片的shape变成了[128,32,10,10],为了输入全连接层(input_dim=32 * 10 * 10, output_dim=65),这里做了一个矩阵变换:x = fluid.layers.reshape(x, shape=[-1, 32 * 10 * 10]),因此图片的shape变成了[128,3200]
  5. 图片经过output_dim=65的全连接层后,3200就变成了标签总数,即65

下面是程序的完整代码:

#导入需要的包
import numpy as np
import paddle as paddle
import paddle.fluid as fluid
from PIL import Image
import cv2
import matplotlib.pyplot as plt
import os
from multiprocessing import cpu_count
from paddle.fluid.dygraph import Pool2D,Conv2D
# from paddle.fluid.dygraph import FC
from paddle.fluid.dygraph import Linear

# 生成车牌字符图像列表
data_path = '/home/aistudio/data'
character_folders = os.listdir(data_path)
label = 0
LABEL_temp = {}
if(os.path.exists('./train_data.list')):
    os.remove('./train_data.list')
if(os.path.exists('./test_data.list')):
    os.remove('./test_data.list')
for character_folder in character_folders:
    with open('./train_data.list', 'a') as f_train:
        with open('./test_data.list', 'a') as f_test:
            if character_folder == '.DS_Store' or character_folder == '.ipynb_checkpoints' or character_folder == 'data23617':
                continue
            print(character_folder + " " + str(label))
            LABEL_temp[str(label)] = character_folder #存储一下标签的对应关系
            character_imgs = os.listdir(os.path.join(data_path, character_folder))
            for i in range(len(character_imgs)):
                if i%10 == 0: 
                    f_test.write(os.path.join(os.path.join(data_path, character_folder), character_imgs[i]) + "\t" + str(label) + '\n')
                else:
                    f_train.write(os.path.join(os.path.join(data_path, character_folder), character_imgs[i]) + "\t" + str(label) + '\n')
    label = label + 1
print('图像列表已生成')

# 用上一步生成的图像列表定义车牌字符训练集和测试集的reader
def data_mapper(sample):
    img, label = sample
    img = paddle.dataset.image.load_image(file=img, is_color=False)
    img = img.flatten().astype('float32') / 255.0
    return img, label
def data_reader(data_list_path):
    def reader():
        with open(data_list_path, 'r') as f:
            lines = f.readlines()
            for line in lines:
                img, label = line.split('\t')
                yield img, int(label)
    return paddle.reader.xmap_readers(data_mapper, reader, cpu_count(), 1024)

# 用于训练的数据提供器
train_reader = paddle.batch(reader=paddle.reader.shuffle(reader=data_reader('./train_data.list'), buf_size=512), batch_size=128)
# 用于测试的数据提供器
test_reader = paddle.batch(reader=data_reader('./test_data.list'), batch_size=128)

#定义LeNet5网络
class MyLeNet(fluid.dygraph.Layer):
    def __init__(self):
        super(MyLeNet,self).__init__()
        self.hidden1_1 = Conv2D(num_channels=1, num_filters=28, filter_size=5, act='relu')
        self.hidden1_2 = Pool2D(pool_size=2, pool_stride=1, pool_type='max')
        self.hidden2_1 = Conv2D(num_channels=28, num_filters=32, filter_size=3, act='relu')
        self.hidden2_2 = Pool2D(pool_size=2, pool_stride=1, pool_type='max')
        self.hidden3 = Conv2D(num_channels=32, num_filters=32, filter_size=3, act='relu')
        self.hidden4 = Linear(input_dim=32*10*10, output_dim=65, act='softmax')

    def forward(self,input):
        # print(input.shape)
        x = self.hidden1_1(input)
        # print(x.shape)
        x = self.hidden1_2(x)
        # print(x.shape)
        x = self.hidden2_1(x)
        # print(x.shape)
        x = self.hidden2_2(x)
        # print(x.shape)
        x = self.hidden3(x)
        # print(x.shape)
        x = fluid.layers.reshape(x, shape=[-1, 32*10*10])
        # print(x.shape)
        y = self.hidden4(x)
        # print(y.shape)

        return y

with fluid.dygraph.guard():
    model=MyLeNet() #模型实例化
    model.train() #训练模式
    opt=fluid.optimizer.SGDOptimizer(learning_rate=0.001, parameter_list=model.parameters())#优化器选用SGD随机梯度下降,学习率为0.001.
    epochs_num=20 #迭代次数为2
    
    for pass_num in range(epochs_num):
        
        for batch_id,data in enumerate(train_reader()):
            images=np.array([x[0].reshape(1,20,20) for x in data],np.float32)
            # print(images)
            labels = np.array([x[1] for x in data]).astype('int64')
            labels = labels[:, np.newaxis]
            image=fluid.dygraph.to_variable(images)
            label=fluid.dygraph.to_variable(labels)
            # print(image.shape)
            predict=model(image)#预测
            
            loss=fluid.layers.cross_entropy(predict,label)
            avg_loss=fluid.layers.mean(loss)#获取loss值
            
            acc=fluid.layers.accuracy(predict,label)#计算精度
            
            if batch_id!=0 and batch_id%50==0:
                print("train_pass:{},batch_id:{},train_loss:{},train_acc:{}".format(pass_num,batch_id,avg_loss.numpy(),acc.numpy()))
            
            avg_loss.backward()
            opt.minimize(avg_loss)
            model.clear_gradients()            
            
    fluid.save_dygraph(model.state_dict(),'MyLeNet')#保存模型

#模型校验
with fluid.dygraph.guard():
    accs = []
    model=MyLeNet()#模型实例化
    model_dict,_=fluid.load_dygraph('MyLeNet')
    model.load_dict(model_dict)#加载模型参数
    model.eval()#评估模式
    for batch_id,data in enumerate(test_reader()):#测试集
        images=np.array([x[0].reshape(1,20,20) for x in data],np.float32)
        labels = np.array([x[1] for x in data]).astype('int64')
        labels = labels[:, np.newaxis]
            
        image=fluid.dygraph.to_variable(images)
        label=fluid.dygraph.to_variable(labels)
            
        predict=model(image)#预测
        acc=fluid.layers.accuracy(predict,label)
        accs.append(acc.numpy()[0])
        avg_acc = np.mean(accs)
    print(avg_acc)

# 对车牌图片进行处理,分割出车牌中的每一个字符并保存
license_plate = cv2.imread('./车牌.png')
gray_plate = cv2.cvtColor(license_plate, cv2.COLOR_RGB2GRAY)
ret, binary_plate = cv2.threshold(gray_plate, 175, 255, cv2.THRESH_BINARY)
result = []
for col in range(binary_plate.shape[1]):
    result.append(0)
    for row in range(binary_plate.shape[0]):
        result[col] = result[col] + binary_plate[row][col]/255
character_dict = {}
num = 0
i = 0
while i < len(result):
    if result[i] == 0:
        i += 1
    else:
        index = i + 1
        while result[index] != 0:
            index += 1
        character_dict[num] = [i, index-1]
        num += 1
        i = index

for i in range(8):
    if i==2:
        continue
    padding = (170 - (character_dict[i][1] - character_dict[i][0])) / 2
    ndarray = np.pad(binary_plate[:,character_dict[i][0]:character_dict[i][1]], ((0,0), (int(padding), int(padding))), 'constant', constant_values=(0,0))
    ndarray = cv2.resize(ndarray, (20,20))
    cv2.imwrite('./' + str(i) + '.png', ndarray)
    
def load_image(path):
    img = paddle.dataset.image.load_image(file=path, is_color=False)
    img = img.astype('float32')
    img = img[np.newaxis, ] / 255.0
    return img

#将标签进行转换
print('Label:',LABEL_temp)
match = {'A':'A','B':'B','C':'C','D':'D','E':'E','F':'F','G':'G','H':'H','I':'I','J':'J','K':'K','L':'L','M':'M','N':'N',
        'O':'O','P':'P','Q':'Q','R':'R','S':'S','T':'T','U':'U','V':'V','W':'W','X':'X','Y':'Y','Z':'Z',
        'yun':'云','cuan':'川','hei':'黑','zhe':'浙','ning':'宁','jin':'津','gan':'赣','hu':'沪','liao':'辽','jl':'吉','qing':'青','zang':'藏',
        'e1':'鄂','meng':'蒙','gan1':'甘','qiong':'琼','shan':'陕','min':'闽','su':'苏','xin':'新','wan':'皖','jing':'京','xiang':'湘','gui':'贵',
        'yu1':'渝','yu':'豫','ji':'冀','yue':'粤','gui1':'桂','sx':'晋','lu':'鲁',
        '0':'0','1':'1','2':'2','3':'3','4':'4','5':'5','6':'6','7':'7','8':'8','9':'9'}
L = 0
LABEL ={}

for V in LABEL_temp.values():
    LABEL[str(L)] = match[V]
    L += 1
print(LABEL)

#构建预测动态图过程
with fluid.dygraph.guard():
    model=MyLeNet()#模型实例化
    model_dict,_=fluid.load_dygraph('MyLeNet')
    model.load_dict(model_dict)#加载模型参数
    model.eval()#评估模式
    lab=[]
    for i in range(8):
        if i==2:
            continue
        infer_imgs = []
        infer_imgs.append(load_image('./' + str(i) + '.png'))
        infer_imgs = np.array(infer_imgs)
        infer_imgs = fluid.dygraph.to_variable(infer_imgs)
        result=model(infer_imgs)
        lab.append(np.argmax(result.numpy()))
# print(lab)


display(Image.open('./车牌.png'))
print('\n车牌识别结果为:',end='')
for i in range(len(lab)):
    print(LABEL[str(lab[i])],end='')

来看一下效果:
在这里插入图片描述

发布了60 篇原创文章 · 获赞 123 · 访问量 4万+

猜你喜欢

转载自blog.csdn.net/zbp_12138/article/details/105285873