Ordinal Regression with Multiple Output CNN for Age Estimation

文献提出了结合CNN和回归进行年龄预测的端到端的深度学习网络,网络结构图如下,

这里写图片描述

输入为 $60\times60\times3$ 的图像,网络的前三层为三个卷积层,前两个卷积层为convolution+relu+batchnorm+pooling,第三个卷积层为convolution+relu+batchnorm结构,卷积层之后是两个全连接层,第一个FC的输出通道为80,由于年龄预测为100个类别(1到100),因此第二个FC的输出通道为200,每两个输出作为一个二分类器.例如对于训练数据形式为 $D={x_i, y_i^k}_{i=1}^N$ , $x_i$ 为输入图像, $y_i$ 为对应的标签,对于第k个二值分类问题,训练数据为 $D={x_i, y_i^k, w_i^k}_{i=1}^N$ ,其中标签 $y_i^k\in {0,1}$ 为年龄类别标签,将年龄的分类转换为100个子问题,即预测该样本的否大于 $i \in {1, 2, 3, ..., 100}$ 岁,如果大于,则该子问题的分类器输出标签为1,否则为0,

这里写图片描述

最后根据所有的100个子问题的输出结果计算预测年龄，其计算公式为：

$predict_{age} = 1 + \sum_{k=1}^{K-1}{f}_{k}(x^,)$

损失函数的计算

采用交叉损失熵,对于每个类别,赋予不同的权重 $w_i^k$ ,

$E_s = -\frac{1}{N}\sum_{i=1}^{N}(1{o_i = y_i}w_ilog(p(o_i|x_i, W))$

我们有K=100个子任务，所以我们最后的损失函数公式为 $E_m = -\frac{1}{N}\sum_{i=1}^{N}\sum_{t=1}^{T}(\lambda^t1{o_i^t = y_i^t}w_i^tlog(p(o_i^t|x_i, W^t)))$
其中 $\lambda^t$ 代表着第t个子任务的数据权重，也就是每个年龄数据量在总数据量中的比重，其计算公式为 $\lambda^t = \frac{\sqrt{N_t}}{\sum_{i=1}^{T}{\sqrt{N_i}}}$ 而$
w_i$表示每一个二分类子问题中每一个类别的权重，可以简单地设为1。最后再利用反向梯度传播来实现参数最优化。

代码分析

文章提供了caffe代码,

输入为HDF5数据格式,

layer {
    top: "data"
    type: "HDF5Data"
    top: "label"
    name: "data"
    hdf5_data_param {
        source: "../data/SourceDataForCaffe/normal_Tutu_morph_wiki/hdf5_data_gray/train.txt"
        batch_size: 256
    }
    include {
        phase: TRAIN
    }
}
layer {
    top: "data"
    top: "label"
    name: "data"
    type: "HDF5Data"

    hdf5_data_param {
        source: "../data/SourceDataForCaffe/normal_Tutu_morph_wiki/hdf5_data_gray/test.txt"
        batch_size: 256
    }
    include {
        phase: TEST
    }
}

对于hdf5数据的转换,可以参考代码,

import sys
import numpy as np
import matplotlib.pyplot as plt
import h5py

IMAGE_SIZE = (60, 60)
MEAN_VALUE = 128

filename = sys.argv[1]
setname, ext = filename.split('.')

with open(filename, 'r') as f:
    lines = f.readlines()

np.random.shuffle(lines)

sample_size = len(lines)
imgs = np.zeros((sample_size, 1,) + IMAGE_SIZE, dtype=np.float32)
freqs = np.zeros((sample_size, 2), dtype=np.float32)

h5_filename = '{}.h5'.format(setname)
with h5py.File(h5_filename, 'w') as h:
    for i, line in enumerate(lines):
        image_name, fx, fy = line[:-1].split()
        img = plt.imread(image_name)[:, :, 0].astype(np.float32)
        img = img.reshape((1, )+img.shape)
        img -= MEAN_VALUE
        imgs[i] = img
        freqs[i] = [float(fx), float(fy)]
        if (i+1) % 1000 == 0:
            print('Processed {} images!'.format(i+1))
    h.create_dataset('data', data=imgs)
    h.create_dataset('freq', data=freqs)

with open('{}_h5.txt'.format(setname), 'w') as f:
    f.write(h5_filename)

之后是卷积和FC层,最后是损失函数层,

layer {
    bottom: "fc_output"
    bottom: "label"
    top: "loss"
    name: "loss"
    type: "OrdinalRegressionLoss"
    include { 
        phase: TRAIN 
    }
    ordinal_regression_loss_param {
        weight_file: "../data/weight_file/weight_file_normal.txt"
    }
} 
layer {
    bottom: "fc_output"
    bottom: "label"
    top: "loss"
    name: "loss"
    type: "OrdinalRegressionLoss"
    include { 
        phase: TEST
    }
    ordinal_regression_loss_param {
        weight_file: "../data/weight_file/weight_file_normal.txt"
    }
}

由于caffe没有多任务训练的损失函数,因此作者编写了对于的层,需要重新编译caffe,可以参考,

https://github.com/kongsicong/Age_recognition_OR/tree/master/OrdinalRegression

处理后便可以训练.

tensorflow模型训练

编写对应的tensorflow模型,训练lmdb年龄分类数据,测试精度为,

平均年龄误差:8.

github参考代码:https://github.com/kongsicong/Age_recognition_OR