虹膜识别之路（一）：深度学习现有CNN下的虹膜识别

今天读了一篇IEEE paper：Iris Recognition With Off-the-Shelf CNN Features: A Deep Learning Perspective，在此做一些笔记。
这篇paper是将现有的CNN架构用于虹膜识别，并对这些架构进行分析与对比。
首先，对于存在的用于虹膜识别的深度学习框架有：DeepIris以及两种DeepIrisNet。DeepIris共有9层网络，一对过滤层，一个卷积层，两个池化层，两个标准化层，两个局部层和一个全连接层。而且这个架构在Q-FIRE 和 CASIA两个数据集上得到了很好的应用。
具体论文：
N. Liu, M. Zhang, H. Li, Z. Sun, and T. Tan, ``Deepiris: Learning pairwise
filter bank for heterogeneous iris verification,’’ Pattern Recognit. Lett.,
vol. 82, no. 2, pp. 154-161, 2015

两种DeepIrisNet，第一种是DeepIrisNet-A，包含8个卷积层（每层后都有一小段标准化层），4个池化层，3个全连接层和2个Drop-out层；第二种是DeepIrisNet-B，在DeepIrisNet-A的基础上添加两个初始化层，以用来增加模型的性能。这两种网络在数据集ND-IRIS-0405 和ND-CrossSensor-Iris-2013上表现良好。
局限：数据集数据不够大，难以训练深度神经网络，为了解决数据集的缺失，迁移学习
具体论文：
A. Gangwar and A. Joshi, ``DeepIrisNet: Deep iris representation with
applications in iris recognition and cross-sensor iris recognition,’’ in Proc.
IEEE Int. Conf. Image Process. (ICIP), Sep. 2016, pp. 23010-2305.

AlexNet

paper：M. D. Zeiler and R. Fergus, ``Visualizing and understanding convolutional
networks,’’ in Proc. Eur. Conf. Comput. Vis. (ECCV), Aug. 2014,
pp. 818833.
我们提取所有的输出卷积层（5）和所有完全连接的层（2）来为虹膜识别任务生成CNN特征

VGG

paper：K. Simonyan and A. Zisserman, ``Very deep convolutional networks for
large-scale image recognition,’’ CoRR, pp. 114, Sep. 2014

我们提取所有卷积层（16）的输出和完全连接的层（2）以生成CNN特征实现虹膜识别任务

GoogleNet

paper：
C. Szegedy et al., ``Going deeper with convolutions,’’ in Proc. IEEE Conf.
Comput. Vis. Pattern Recognit. (CVPR), Jun. 2015, pp. 19

C. Szegedy, V. Vanhoucke, S. Ioffe, J. Shlens, and Z. Wojna, ``Rethinking
the inception architecture for computer vision,’’ in Proc. IEEE Conf.
Comput. Vis. Pattern Recognit. (CVPR), Jun. 2016, pp. 28182826.

C. Szegedy, V. Vanhoucke, S. Ioffe, and Z. Wojna, ``Inception-v4,
inception-resnet and the impact of residual connections on learning,’’ in
Proc. AAAI Conf. Artif. Intell., 2017, pp. 42784284.

我们提取所有卷积层（5）和所有初始层的输出（12）生成用于虹膜识别任务的CNN特征

ResNet

paper：K. He, X. Zhang, S. Ren, and J. Sun, ``Deep residual learning for image
recognition,’’ in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR),
Jun. 2016, pp. 770778.
我们提取所有卷积层（1）和所有卷积的输出瓶颈层（17）生成CNN特征来进行虹膜识别任务

DenseNet

paper：
G. Huang, Z. Liu, L. van der Maaten, and K. Q. Weinberger, ``Densely
connected convolutional networks,’’ in Proc. IEEE Conf. Comput. Vis.
Pattern Recognit. (CVPR), Jul. 2017, pp. 22612269

它将CNN的每一层连接到每一层前馈方式中的其他层，
使用密集连接，架构带来了几个优点
减轻消失梯度问题，
加强特征传播，
鼓励特征重用，
大大减少了参数的数量

我们提取所选的密集层数（15）的输出生成CNN特征用于虹膜识别任务

其他架构

paper：
A. Canziani, A. Paszke, and E. Culurciello, ``An analysis of deep neural
network models for practical applications,’’ CoRR, pp. 17, May 2016

J. Gu et al., ``Recent advances in convolutional neural networks,’’ CoRR,
pp. 187332, Dec. 2017.

分割 segmentation

首先，虹膜通过提取两个圆形轮廓来定位属于虹膜区域的内外边界

积分微分算子，是用作圆形探测器的最常见的算子之一，可用数学表示为
在这里插入图片描述
眼睑：使用轮廓定位改变圆形为圆弧
噪声掩膜可以区别出虹膜像素和非虹膜像素
噪声掩膜是生成在分割阶段，用于后续步骤的

标准化 normalization

由虹膜的内外边界包围的区域，由于瞳孔的扩张和收缩，可能会有所不同。
为此，Daugman提出使用橡胶板模型来改造分段虹膜到固定的矩形区域。

这个过程是通过重新映射虹膜区域I（x，y）来实现原始笛卡尔坐标（x，y）到无量纲极坐标坐标（r，zeta），可以数学表示为
在这里插入图片描述

标准话化还可以减少匹配过程中出现的眼睛的转动来实现简单的转换
相应的噪声掩膜也会被标准化

CNN 特征提取 feature extraction

然后将归一化的虹膜图像馈送到CNN特征提取模块中。使用5种CNN架构对标准化的图片进行特征提取，调查表现性能，将每一层的输出作为特征描述符并报告出相应的识别准确率。

SVM分类器 classification

paper：B. Scholkopf and A. J. Smola, Learning With Kernels: Support Vector
Machines, Regularization, Optimization, and Beyond. Cambridge, MA,
USA: MIT Press, 2001.

多分类SVM

datasets

LG2200 dataset
CASIA-Iris-Thousand

性能度量

识别率
识别率按正确比例计算以预定的错误接受率（FAR）对样本进行分类 FAR=0.1%
用于比较的基线特征描述符是Gabor相位象限特征汉明距离

实验装置

USIT v2.2
Pytorch 该框架的两个最先进的功能是动态的图形计算和命令式编程，使深度网络编码更加灵活和强大

对于分类，我们使用了LIBSVM库在scikit-learn库中实现的Python包装器,这使得易于与特征提取这一步整合。

分析

对于LG2200 dataset：
layer 10 for VGG, layer 10 for Inception, layer 11 for ResNet and layer 6 for DenseNet
对于CASIA-Iris-Thousand dataset：
layer 9 for VGG, layer 10 for Inception, layer 12 for ResNet and layer 5 for DenseNet.

Among all five CNNs, DenseNet achieves the highest peak recognition accuracy of 98.7% at layer 6 on the LG2200 dataset and 98.8% at layer 5 on the CASIA-Iris-Thousand dataset.
ResNet and Inception achieve similar peak recognition accuracies of 98.0% and 98.2% at layers 11 and 10, respectively, on the LG2200 dataset;
and 98.5% and 98.3% at layers 12 and 10, respectively,on the CASIA-Iris-Thousand dataset.

VGG, with its simple architecture, only achieves 92.7% and 93.1% recognition accuracy, both at layer 9, on the LG2200 and CASIA-Iris-
Thousand datasets, respectively
在这里插入图片描述

存在的问题

计算复杂度
域适应和微调
少量学习
结构演变
其他的架构：
DBN SAE RNN

刚刚入门，大概就是这些，后续会持续更新~一起努力探索吧！