MTCNN-Tensorflow

git clone https://github.com/AITTSMD/MTCNN-Tensorflow

mtcnn为一个多任务训练，物体框和特征点数据分别为两个数据集,

数据集1标记里物体框位置，因此只用与训练物体检测。

数据集2标记有物体框，特征点，用于训练特征点。

输入数据每行分别为，

path to image， cls_label， bbox_label， landmark_label

对于数据集1,随机裁剪物体框，根据物体框与ground truth bbox的IOU值，得到正、负、部分样本集，对于数据集1,没有特征点，特征点用0补充。

对于数据集2,提取特征点，并采取crop,flip,rotate方式进行数据增益，预测的时候，根据预测框，然后预测特征点，所以，输入bbox用0补充。

在训练的时候，用一个label值，标记当前的数据是用于训练物体框类别，物体框位置，还是特征点。

For pos sample,cls_label=1,bbox_label(calculate),landmark_label=[0,0,0,0,0,0,0,0,0,0].

For part sample,cls_label=-1,bbox_label(calculate),landmark_label=[0,0,0,0,0,0,0,0,0,0].

For landmark sample,cls_label=-2,bbox_label=[0,0,0,0],landmark_label(calculate).

For neg sample,cls_label=0,bbox_label=[0,0,0,0],landmark_label=[0,0,0,0,0,0,0,0,0,0].

数据准备

下载训练数据，

http://mmlab.ie.cuhk.edu.hk/projects/WIDERFace/ ,人脸框数据。

http://mmlab.ie.cuhk.edu.hk/projects/CelebA.html，特征点数据，作者发现celeba数据有些错误，于是使用了，http://mmlab.ie.cuhk.edu.hk/archive/CNN_FacePoint.htm。

将下载好的训练数据放在MTCNN-Tensorflow/prepare_data目录。

gen_12net_data.py

用于得到正、负样本框，对每张图片，得到50个负样本框（np.max(Iou) < 0.3），并将负样本框图片保存在 ‘…/…/DATA/12/negative’。

生成正样本框，iou >= 0.65，保存在…/…/DATA/12/positive/，u >= 0.4，保存在…/…/DATA/12/part/.

wider_face_train.txt为标注文件，每行为文件名，bbox1,bbox2,…,每个bbox对应为box的四个点的坐标，

im_path = annotation[0]
#print(im_path)
#boxed change to float type
bbox = list(map(float, annotation[1:]))
#gt
boxes = np.array(bbox, dtype=np.float32).reshape(-1, 4)

gen_landmark_aug_12.py

用于得到特征点，运行命令为，

python gen_landmark_aug_12.py

输入为，trainImageList.txt，保存有image path , bounding box, and landmarks.

文件的目的是根据bbox，裁剪人脸，根据裁剪的人脸，对特征点进行guiyihau，并将人脸，特征点保存。同时，对数据进行扩增，包括一定范围内随机crop人脸，flip,rotate等。

文件主要用于裁剪目标框图片，

f_face = img[bbox.top:bbox.bottom+1,bbox.left:bbox.right+1]

对特征点归一化，

#normalize land mark by dividing the width and height of the ground truth bounding box
# landmakrGt is a list of tuples
for index, one in enumerate(landmarkGt):
    # (( x - bbox.left)/ width of bounding box, (y - bbox.top)/ height of bounding box
    rv = ((one[0]-gt_box[0])/(gt_box[2]-gt_box[0]), (one[1]-gt_box[1])/(gt_box[3]-gt_box[1]))
    # put the normalized value into the new list landmark
    landmark[index] = rv

gen_imglist_pnet.py

将上面得到的样本框，特征点合并写入到一个文件中。

gen_PNet_tfrecords.py

将输入数据处理成pnet

gen_hard_example

读取文件wider_face_train_bbx_gt.txt，并用上面训练好的pnet模型检测文件中所有图像的目标物体框。使用这个目标物体框作为候选框，根据其与ground truth box的IOU值得到positive,negative,part训练样本。

gen_landmark_aug_24.py

从trainImageList.txt读取特征点，裁剪人脸框，并resize为24×24，用于rnet训练。

gen_imglist_rnet.py

合并正、负样本训练框，已经gen_landmark_aug_24.py得到的真是人脸框图像，特征点。

gen_RNet_tfrecords.py

训练数据转化为tfrecords形式。需要运行4次，分别生成neg,pos,part and landmark的tfrecords。

gen_hard_example

读取文件wider_face_train_bbx_gt.txt，并用上面训练好的pnet，rnet模型检测文件中所有图像的目标物体框。使用这个目标物体框作为候选框，根据其与ground truth box的IOU值得到positive,negative,part训练样本。

gen_landmark_aug_48.py

从trainImageList.txt读取特征点，裁剪人脸框，并resize为48×48，用于onet训练。

gen_imglist_rnet.py

合并正、负样本训练框，已经gen_landmark_aug_48.py得到的真是人脸框图像，特征点。

gen_ONet_tfrecords.py

训练数据转化为tfrecords形式。需要运行4次，分别生成neg,pos,part and landmark的tfrecords。

对于pnet,pos,part,landmark,neg的比例大约为1:1:1:3，所以可以将他们合并生成一个tfrecords文件用于训练。

而对于rnet,onet,他们的4类训练数据不平衡，所以，训练的时候，每个mibi batch,

read 64 samples from pos,part and landmark tfrecord and read 192 samples from neg tfrecord

训练数据保存格式，

[path to image][cls_label][bbox_label][landmark_label]

For pos sample,cls_label=1,bbox_label(calculate),landmark_label=[0,0,0,0,0,0,0,0,0,0].也就是没有grount truth特征点，这部分用来训练框

For part sample,cls_label=-1,bbox_label(calculate),landmark_label=[0,0,0,0,0,0,0,0,0,0].也就是没有grount truth特征点，这部分用来训练框

For landmark sample,cls_label=-2,bbox_label=[0,0,0,0],landmark_label(calculate).也就是ground truth box,框直接用预测的框，这部分用来训练特征点  

For neg sample,cls_label=0,bbox_label=[0,0,0,0],landmark_label=[0,0,0,0,0,0,0,0,0,0].

用cls_label区分个部分数据,用于后面计算利用个部分数据去计算损失函数。

训练数据读取

对于pos,net,part,landmark部分的训练数据，每个batch读取的数量是不一样的，

assert pos_batch_size != 0,"Batch Size Error "
part_batch_size = int(np.ceil(config.BATCH_SIZE*part_radio))
assert part_batch_size != 0,"Batch Size Error "
neg_batch_size = int(np.ceil(config.BATCH_SIZE*neg_radio))
assert neg_batch_size != 0,"Batch Size Error "
landmark_batch_size = int(np.ceil(config.BATCH_SIZE*landmark_radio))
assert landmark_batch_size != 0,"Batch Size Error "
batch_sizes = [pos_batch_size,part_batch_size,neg_batch_size,landmark_batch_size]

image_batch, label_batch, bbox_batch,landmark_batch = read_multi_tfrecords(dataset_dirs,batch_sizes, net)

读取batch数据，

def read_multi_tfrecords(tfrecord_files, batch_sizes, net):
    pos_dir,part_dir,neg_dir,landmark_dir = tfrecord_files
    pos_batch_size,part_batch_size,neg_batch_size,landmark_batch_size = batch_sizes
    #assert net=='RNet' or net=='ONet', "only for RNet and ONet"
    pos_image,pos_label,pos_roi,pos_landmark = read_single_tfrecord(pos_dir, pos_batch_size, net)
    print(pos_image.get_shape())
    part_image,part_label,part_roi,part_landmark = read_single_tfrecord(part_dir, part_batch_size, net)
    print(part_image.get_shape())
    neg_image,neg_label,neg_roi,neg_landmark = read_single_tfrecord(neg_dir, neg_batch_size, net)
    print(neg_image.get_shape())
    landmark_image,landmark_label,landmark_roi,landmark_landmark = read_single_tfrecord(landmark_dir, landmark_batch_size, net)
    print(landmark_image.get_shape())

    images = tf.concat([pos_image,part_image,neg_image,landmark_image], 0, name="concat/image")
    print(images.get_shape())
    labels = tf.concat([pos_label,part_label,neg_label,landmark_label],0,name="concat/label")
    print
    assert isinstance(labels, object)
    labels.get_shape()
    rois = tf.concat([pos_roi,part_roi,neg_roi,landmark_roi],0,name="concat/roi")
    print( rois.get_shape())
    landmarks = tf.concat([pos_landmark,part_landmark,neg_landmark,landmark_landmark],0,name="concat/landmark")
    return images,labels,rois,landmarks

损失函数计算

物体框类别部分采用交叉损失熵损失函数，只是用pos,neg数据，

def cls_ohem(cls_prob, label):
    zeros = tf.zeros_like(label)
    #label=-1 --> label=0net_factory

    #pos -> 1, neg -> 0, others -> 0
    label_filter_invalid = tf.where(tf.less(label,0), zeros, label)
    num_cls_prob = tf.size(cls_prob)
    cls_prob_reshape = tf.reshape(cls_prob,[num_cls_prob,-1])
    label_int = tf.cast(label_filter_invalid,tf.int32)
    # get the number of rows of class_prob
    num_row = tf.to_int32(cls_prob.get_shape()[0])
    #row = [0,2,4.....]
    row = tf.range(num_row)*2
    indices_ = row + label_int
    label_prob = tf.squeeze(tf.gather(cls_prob_reshape, indices_))
    loss = -tf.log(label_prob+1e-10)
    zeros = tf.zeros_like(label_prob, dtype=tf.float32)
    ones = tf.ones_like(label_prob,dtype=tf.float32)
    # set pos and neg to be 1, rest to be 0
    valid_inds = tf.where(label < zeros,zeros,ones)
    # get the number of POS and NEG examples
    num_valid = tf.reduce_sum(valid_inds)

    keep_num = tf.cast(num_valid*num_keep_radio,dtype=tf.int32)
    #FILTER OUT PART AND LANDMARK DATA
    loss = loss * valid_inds
    loss,_ = tf.nn.top_k(loss, k=keep_num)
    return tf.reduce_mean(loss)

并只用前num_keep_radio的数据用于训练。

bbox部分，只是用pos,part部分数据用于训练。

#label=1 or label=-1 then do regression
def bbox_ohem(bbox_pred,bbox_target,label):
    '''

    :param bbox_pred:
    :param bbox_target:
    :param label: class label
    :return: mean euclidean loss for all the pos and part examples
    '''
    zeros_index = tf.zeros_like(label, dtype=tf.float32)
    ones_index = tf.ones_like(label,dtype=tf.float32)
    # keep pos and part examples
    valid_inds = tf.where(tf.equal(tf.abs(label), 1),ones_index,zeros_index)
    #(batch,)
    #calculate square sum
    square_error = tf.square(bbox_pred-bbox_target)
    square_error = tf.reduce_sum(square_error,axis=1)
    #keep_num scalar
    num_valid = tf.reduce_sum(valid_inds)
    #keep_num = tf.cast(num_valid*num_keep_radio,dtype=tf.int32)
    # count the number of pos and part examples
    keep_num = tf.cast(num_valid, dtype=tf.int32)
    #keep valid index square_error
    square_error = square_error*valid_inds
    # keep top k examples, k equals to the number of positive examples
    _, k_index = tf.nn.top_k(square_error, k=keep_num)
    square_error = tf.gather(square_error, k_index)

    return tf.reduce_mean(square_error)

landmark部分，只取label=-2，也就是landmark数据用于训练。

def landmark_ohem(landmark_pred,landmark_target,label):
    '''

    :param landmark_pred:
    :param landmark_target:
    :param label:
    :return: mean euclidean loss
    '''
    #keep label =-2  then do landmark detection
    ones = tf.ones_like(label,dtype=tf.float32)
    zeros = tf.zeros_like(label,dtype=tf.float32)
    valid_inds = tf.where(tf.equal(label,-2),ones,zeros)
    square_error = tf.square(landmark_pred-landmark_target)
    square_error = tf.reduce_sum(square_error,axis=1)
    num_valid = tf.reduce_sum(valid_inds)
    #keep_num = tf.cast(num_valid*num_keep_radio,dtype=tf.int32)
    keep_num = tf.cast(num_valid, dtype=tf.int32)
    square_error = square_error*valid_inds
    _, k_index = tf.nn.top_k(square_error, k=keep_num)
    square_error = tf.gather(square_error, k_index)
    return tf.reduce_mean(square_error)

数据准备

猜你喜欢