背景：MTCNN的训练是在python上实现的，我们需要对其结构进行更改。

目的：读懂MTCNN的python代码。

一、代码结构

1.1 tensorflow设置与设备设置

import os  
os.environ['CUDA_VISIBLE_DEVICES']='1'
    ...
    file_paths = get_model_filenames(args.model_dir)
    with tf.device('/gpu:0'):
        with tf.Graph().as_default():
            config = tf.ConfigProto(allow_soft_placement=True)
            with tf.Session(config=config) as sess:

注意,关于GPU的device是在os.environ['CUDA_VISIBLE_DEVICES']='1'后面这个变量来更改的，而不是后面的with tf.device('/gpu:0'):，这个需要后面查找什么意思。

1.2 设置placeholder与out_tensor

image_pnet = tf.placeholder(
	tf.float32, [None, None, None, 3])
pnet = PNet({'data': image_pnet}, mode='test')
out_tensor_pnet = pnet.get_all_output()

image_rnet = tf.placeholder(tf.float32, [None, 24, 24, 3])
rnet = RNet({'data': image_rnet}, mode='test')
out_tensor_rnet = rnet.get_all_output()

image_onet = tf.placeholder(tf.float32, [None, 48, 48, 3])
onet = ONet({'data': image_onet}, mode='test')
out_tensor_onet = onet.get_all_output()

1.3 网络saver

saver_pnet = tf.train.Saver(
	[v for v in tf.global_variables()
	 if v.name[0:5] == "pnet/"])
saver_rnet = tf.train.Saver(
	[v for v in tf.global_variables()
	 if v.name[0:5] == "rnet/"])
saver_onet = tf.train.Saver(
	[v for v in tf.global_variables()
	 if v.name[0:5] == "onet/"])

saver_pnet.restore(sess, file_paths[0])

1.4 定义相应的网络fun

def pnet_fun(img): return sess.run(
	out_tensor_pnet, feed_dict={image_pnet: img})
saver_rnet.restore(sess, file_paths[1])

def rnet_fun(img): return sess.run(
	out_tensor_rnet, feed_dict={image_rnet: img})
saver_onet.restore(sess, file_paths[2])

def onet_fun(img): return sess.run(
	out_tensor_onet, feed_dict={image_onet: img})

1.5 用detect_face函数给出备选框

rectangles, points = detect_face(img, args.minsize,
		 pnet_fun, rnet_fun, onet_fun,
		 args.threshold, args.factor)

二、placeholder与out_tensor

2.1 tf.placeholder

tf.placeholder(dtype, shape=None, name=None)

placeholder，占位符，在tensorflow中类似于函数参数，运行时必须传入值。

image_pnet = tf.placeholder(tf.float32, [None, None, None, 3])意思就是类型为float32类型，四维的数组，最后一个维度为3。

image_rnet = tf.placeholder(tf.float32, [None, 24, 24, 3])

image_onet = tf.placeholder(tf.float32, [None, 48, 48, 3])

根据向量可以看出分别是12，24与48，但是第一层仅仅是在训练的时候用12*12来训练，所以维度为None

2.2 PNet，RNet，ONet

引入在from src.mtcnn import PNet, RNet, ONet

2.2.1 pnet = PNet({'data': image_pnet}, mode='test')

定义在src/mtcnn.py之中

#src/mtcnn.py
class PNet(NetWork):
    def setup(self, task='data', reuse=False):
    ...
    if self.mode == 'train':
        ...
    else
        ...
        self.out_put.append(self.get_output())

pnet是具体化的PNet,其中的'data'对应于image_pnet,其中的mode对应于'train'的else

2.2.2 out_tensor_pnet = pnet.get_all_output()

#src/mtcnn.py
class NetWork(object):
    ...
    def get_all_output(self):
        return self.out_put
    ...

    def get_output(self):
        return self.terminals[-1]

self.output是最终定义完网络结构之后的最终的输出。

针对Pnet，输入为image_pnet，输出为out_tensor_pnet

三、tf.train.saver模型的保存与恢复

https://www.cnblogs.com/denny402/p/6940134.html

https://blog.csdn.net/index20001/article/details/74322198

四、网络的结构定义

Pnet原始结构

Feature size	Kernel size	Stride	Padding
12123	3310	1	Valid
101010	Maxpool 2*2	2	Same
5510	3316	1	Valid
3316	3332	1	Valid
1132

Pnet改进结构

Feature size	Kernel size	Stride	Padding
12123	3310	1	Valid
101010	3310	2	Same
5510	3316	1	Valid
3316	3332	1	Valid
1132

Pnet 最终结构，只有3×3的卷积，padding为same

Feature size	Kernel size	Stride	Padding
12123	3310	1	Same
121210	3316	2	Same
6616	3332	2	Same
3332	3332	1	Valid
1132

注意！代码更改之后stride也变了，所以需要更改tools之中的generateBoundingBox的stride的尺度，应该为4

最终训练结构：

all trainable variables:
<tf.Variable 'pnet/conv1/weights:0' shape=(3, 3, 3, 10) dtype=float32_ref>
<tf.Variable 'pnet/conv1/biases:0' shape=(10,) dtype=float32_ref>
<tf.Variable 'pnet/PReLU1/alpha:0' shape=(10,) dtype=float32_ref>
<tf.Variable 'pnet/pool1_conv1/weights:0' shape=(3, 3, 10, 16) dtype=float32_ref>
<tf.Variable 'pnet/pool1_conv1/biases:0' shape=(16,) dtype=float32_ref>
<tf.Variable 'pnet/pool1_PReLU1/alpha:0' shape=(16,) dtype=float32_ref>
<tf.Variable 'pnet/conv2/weights:0' shape=(3, 3, 16, 32) dtype=float32_ref>
<tf.Variable 'pnet/conv2/biases:0' shape=(32,) dtype=float32_ref>
<tf.Variable 'pnet/PReLU2/alpha:0' shape=(32,) dtype=float32_ref>
<tf.Variable 'pnet/conv3/weights:0' shape=(3, 3, 32, 32) dtype=float32_ref>
<tf.Variable 'pnet/conv3/biases:0' shape=(32,) dtype=float32_ref>
<tf.Variable 'pnet/PReLU3/alpha:0' shape=(32,) dtype=float32_ref>
<tf.Variable 'pnet/conv4-1/weights:0' shape=(1, 1, 32, 2) dtype=float32_ref>
<tf.Variable 'pnet/conv4-1/biases:0' shape=(2,) dtype=float32_ref>
<tf.Variable 'pnet/conv4-2/weights:0' shape=(1, 1, 32, 4) dtype=float32_ref>
<tf.Variable 'pnet/conv4-2/biases:0' shape=(4,) dtype=float32_ref>
<tf.Variable 'pnet/conv4-3/weights:0' shape=(1, 1, 32, 10) dtype=float32_ref>
<tf.Variable 'pnet/conv4-3/biases:0' shape=(10,) dtype=float32_ref>

Rnet原始结构

Feature size	Kernel size	Stride	Padding
24243	3328	1	Valid
222228	maxPool 3*3	2	Same
111128	3348	1	Valid
9948	maxPool 3*3	2	valid
4448	2264	1	Valid
3364

Rnet改进结构

Feature size	Kernel size	Stride	Padding
24243	3328	1	Valid
222228	3328	2	Same
111128	3348	1	Valid
9948	3348	2	same
5548	3364	1	valid
3364

Rnet最终结构

Feature size	Kernel size	Stride	Padding
24243	3328	1	Same
242428	3328	2	Same
121228	3348	1	Same
121248	3348	2	Same
6648	3364	2	Same
3364

最终训练结构：

all trainable variables:
<tf.Variable 'rnet/conv1/weights:0' shape=(3, 3, 3, 28) dtype=float32_ref>
<tf.Variable 'rnet/conv1/biases:0' shape=(28,) dtype=float32_ref>
<tf.Variable 'rnet/prelu1/alpha:0' shape=(28,) dtype=float32_ref>
<tf.Variable 'rnet/pool1_conv1/weights:0' shape=(3, 3, 28, 28) dtype=float32_ref>
<tf.Variable 'rnet/pool1_conv1/biases:0' shape=(28,) dtype=float32_ref>
<tf.Variable 'rnet/pool1_prelu1/alpha:0' shape=(28,) dtype=float32_ref>
<tf.Variable 'rnet/conv2/weights:0' shape=(3, 3, 28, 48) dtype=float32_ref>
<tf.Variable 'rnet/conv2/biases:0' shape=(48,) dtype=float32_ref>
<tf.Variable 'rnet/prelu2/alpha:0' shape=(48,) dtype=float32_ref>
<tf.Variable 'rnet/pool2_conv3/weights:0' shape=(3, 3, 48, 48) dtype=float32_ref>
<tf.Variable 'rnet/pool2_conv3/biases:0' shape=(48,) dtype=float32_ref>
<tf.Variable 'rnet/poo2_prelu3/alpha:0' shape=(48,) dtype=float32_ref>
<tf.Variable 'rnet/conv3/weights:0' shape=(3, 3, 48, 64) dtype=float32_ref>
<tf.Variable 'rnet/conv3/biases:0' shape=(64,) dtype=float32_ref>
<tf.Variable 'rnet/prelu3/alpha:0' shape=(64,) dtype=float32_ref>
<tf.Variable 'rnet/conv4/weights:0' shape=(576, 128) dtype=float32_ref>
<tf.Variable 'rnet/conv4/biases:0' shape=(128,) dtype=float32_ref>
<tf.Variable 'rnet/prelu4/alpha:0' shape=(128,) dtype=float32_ref>
<tf.Variable 'rnet/conv5-1/weights:0' shape=(128, 2) dtype=float32_ref>
<tf.Variable 'rnet/conv5-1/biases:0' shape=(2,) dtype=float32_ref>
<tf.Variable 'rnet/conv5-2/weights:0' shape=(128, 4) dtype=float32_ref>
<tf.Variable 'rnet/conv5-2/biases:0' shape=(4,) dtype=float32_ref>
<tf.Variable 'rnet/conv5-3/weights:0' shape=(128, 10) dtype=float32_ref>
<tf.Variable 'rnet/conv5-3/biases:0' shape=(10,) dtype=float32_ref>

Onet原始结构

Feature size	Kernel size	Stride	Padding
48483	3332	1	Valid
464632	maxPool 3*3	2	Same
232332	3364	1	Valid
212164	maxPool 3*3	2	valid
101064	3364	1	Valid
8864	maxPool 2*2	2	Same
4464	22128	1	valid
33128

Onet改进结构

Feature size	Kernel size	Stride	Padding
48483	3332	1	Valid
464632	3332	2	Same
232332	3364	1	valid
212164	3364	2	valid
101064	33128	2	same
55128	33128	1	valid
33128

Onet最终结构

Feature size	Kernel size	Stride	Padding
48483	3332	1	Same
484832	3364	2	Same
242464	3364	1	Same
242464	33128	2	Same
1212128	33256	2	Same
66256	33128	2	Same
33128

最终训练参数

<tf.Variable 'onet/conv1/weights:0' shape=(3, 3, 3, 32) dtype=float32_ref>
<tf.Variable 'onet/conv1/biases:0' shape=(32,) dtype=float32_ref>
<tf.Variable 'onet/prelu1/alpha:0' shape=(32,) dtype=float32_ref>
<tf.Variable 'onet/conv2/weights:0' shape=(3, 3, 32, 64) dtype=float32_ref>
<tf.Variable 'onet/conv2/biases:0' shape=(64,) dtype=float32_ref>
<tf.Variable 'onet/prelu2/alpha:0' shape=(64,) dtype=float32_ref>
<tf.Variable 'onet/conv3/weights:0' shape=(3, 3, 64, 64) dtype=float32_ref>
<tf.Variable 'onet/conv3/biases:0' shape=(64,) dtype=float32_ref>
<tf.Variable 'onet/prelu3/alpha:0' shape=(64,) dtype=float32_ref>
<tf.Variable 'onet/conv4/weights:0' shape=(3, 3, 64, 128) dtype=float32_ref>
<tf.Variable 'onet/conv4/biases:0' shape=(128,) dtype=float32_ref>
<tf.Variable 'onet/prelu4/alpha:0' shape=(128,) dtype=float32_ref>
<tf.Variable 'onet/conv5_/weights:0' shape=(3, 3, 128, 256) dtype=float32_ref>
<tf.Variable 'onet/conv5_/biases:0' shape=(256,) dtype=float32_ref>
<tf.Variable 'onet/prelu5_/alpha:0' shape=(256,) dtype=float32_ref>
<tf.Variable 'onet/conv6_/weights:0' shape=(3, 3, 256, 128) dtype=float32_ref>
<tf.Variable 'onet/conv6_/biases:0' shape=(128,) dtype=float32_ref>
<tf.Variable 'onet/prelu6_/alpha:0' shape=(128,) dtype=float32_ref>
<tf.Variable 'onet/conv5/weights:0' shape=(1152, 256) dtype=float32_ref>
<tf.Variable 'onet/conv5/biases:0' shape=(256,) dtype=float32_ref>
<tf.Variable 'onet/prelu5/alpha:0' shape=(256,) dtype=float32_ref>
<tf.Variable 'onet/conv6-1/weights:0' shape=(256, 2) dtype=float32_ref>
<tf.Variable 'onet/conv6-1/biases:0' shape=(2,) dtype=float32_ref>
<tf.Variable 'onet/conv6-2/weights:0' shape=(256, 4) dtype=float32_ref>
<tf.Variable 'onet/conv6-2/biases:0' shape=(4,) dtype=float32_ref>
<tf.Variable 'onet/conv6-3/weights:0' shape=(256, 10) dtype=float32_ref>
<tf.Variable 'onet/conv6-3/biases:0' shape=(10,) dtype=float32_ref>

4.1 conv之中的定义

#src/mtcnn.py   in class NetWork(object):  
def conv(self, inp, k_h, k_w, c_o, s_h, s_w, name,
		 task=None, relu=True, padding='SAME',
		 group=1, biased=True, wd=None):

	self.validate_padding(padding)
	c_i = int(inp.get_shape()[-1])
	assert c_i % group == 0
	assert c_o % group == 0

	def convolve(i, k): return tf.nn.conv2d(
		i, k, [1, s_h, s_w, 1], padding=padding)
	with tf.variable_scope(name) as scope:
		kernel = self.make_var(
			'weights', shape=[
				k_h, k_w, c_i / group, c_o])
		if group == 1:
			output = convolve(inp, kernel)
		else:
			input_groups = tf.split(inp, group, 3)
			kernel_groups = tf.split(kernel, group, 3)
			output_groups = [convolve(i, k) for i, k in
							 zip(input_groups, kernel_groups)]
			output = tf.concat(output_groups, 3)
		if (wd is not None) and (self.mode == 'train'):
			self.weight_decay[task].append(
				tf.multiply(tf.nn.l2_loss(kernel), wd))
		if biased:
			biases = self.make_var('biases', [c_o])
			output = tf.nn.bias_add(output, biases)
		if relu:
			output = tf.nn.relu(output, name=scope.name)
		return output

padding='SAME'就是输入输出一样大，‘VALID’就是不进行padding

几个数字分别为卷积核的大小，卷积核的个数，卷积核时的stride

4.2 wd=self.weight_decay_coeff

网络中有的有此语句，有的没有此语句，该语句全都在后几层。

# src/mtcnn.py  in class class NetWork(object): 
# in    def conv
if (wd is not None) and (self.mode == 'train'):
	self.weight_decay[task].append(
		tf.multiply(tf.nn.l2_loss(kernel), wd))

是对权重的步长的设置，应该对结果没有太大影响。

Feature size	Kernel size	Stride	Padding
12123	3310	1	Valid
101010	3310	2	Same
5510	3316	1	Valid
3316	3332	1	Valid
1132

Feature size	Kernel size	Stride	Padding
12123	3310	1	Same
121210	3316	2	Same
6616	3332	2	Same
3332	3332	1	Valid
1132

Feature size	Kernel size	Stride	Padding
24243	3328	1	Valid
222228	3328	2	Same
111128	3348	1	Valid
9948	3348	2	same
5548	3364	1	valid
3364

Feature size	Kernel size	Stride	Padding
24243	3328	1	Same
242428	3328	2	Same
121228	3348	1	Same
121248	3348	2	Same
6648	3364	2	Same
3364

Feature size	Kernel size	Stride	Padding
48483	3332	1	Valid
464632	3332	2	Same
232332	3364	1	valid
212164	3364	2	valid
101064	33128	2	same
55128	33128	1	valid
33128

MTCNN（二）python代码解析及结构更改