MTCNN(二)python代码解析及结构更改

版权声明:转载注明出处:邢翔瑞的技术博客https://blog.csdn.net/weixin_36474809/article/list/1 https://blog.csdn.net/weixin_36474809/article/details/82856171

背景:MTCNN的训练是在python上实现的,我们需要对其结构进行更改。

目的:读懂MTCNN的python代码。

一、代码结构

1.1 tensorflow设置与设备设置

import os  
os.environ['CUDA_VISIBLE_DEVICES']='1'
    ...
    file_paths = get_model_filenames(args.model_dir)
    with tf.device('/gpu:0'):
        with tf.Graph().as_default():
            config = tf.ConfigProto(allow_soft_placement=True)
            with tf.Session(config=config) as sess:

注意,关于GPU的device是在os.environ['CUDA_VISIBLE_DEVICES']='1'后面这个变量来更改的,而不是后面的with tf.device('/gpu:0'):,这个需要后面查找什么意思。

1.2 设置placeholder与out_tensor

image_pnet = tf.placeholder(
	tf.float32, [None, None, None, 3])
pnet = PNet({'data': image_pnet}, mode='test')
out_tensor_pnet = pnet.get_all_output()

image_rnet = tf.placeholder(tf.float32, [None, 24, 24, 3])
rnet = RNet({'data': image_rnet}, mode='test')
out_tensor_rnet = rnet.get_all_output()

image_onet = tf.placeholder(tf.float32, [None, 48, 48, 3])
onet = ONet({'data': image_onet}, mode='test')
out_tensor_onet = onet.get_all_output()

1.3 网络saver

saver_pnet = tf.train.Saver(
	[v for v in tf.global_variables()
	 if v.name[0:5] == "pnet/"])
saver_rnet = tf.train.Saver(
	[v for v in tf.global_variables()
	 if v.name[0:5] == "rnet/"])
saver_onet = tf.train.Saver(
	[v for v in tf.global_variables()
	 if v.name[0:5] == "onet/"])

saver_pnet.restore(sess, file_paths[0])

1.4 定义相应的网络fun

def pnet_fun(img): return sess.run(
	out_tensor_pnet, feed_dict={image_pnet: img})
saver_rnet.restore(sess, file_paths[1])

def rnet_fun(img): return sess.run(
	out_tensor_rnet, feed_dict={image_rnet: img})
saver_onet.restore(sess, file_paths[2])

def onet_fun(img): return sess.run(
	out_tensor_onet, feed_dict={image_onet: img})

1.5 用detect_face函数给出备选框

rectangles, points = detect_face(img, args.minsize,
		 pnet_fun, rnet_fun, onet_fun,
		 args.threshold, args.factor)

二、placeholder与out_tensor

2.1 tf.placeholder

tf.placeholder(dtype, shape=None, name=None)

placeholder,占位符,在tensorflow中类似于函数参数,运行时必须传入值。

image_pnet = tf.placeholder(tf.float32, [None, None, None, 3])意思就是类型为float32类型,四维的数组,最后一个维度为3。

image_rnet = tf.placeholder(tf.float32, [None, 24, 24, 3])

image_onet = tf.placeholder(tf.float32, [None, 48, 48, 3])

根据向量可以看出分别是12,24与48,但是第一层仅仅是在训练的时候用12*12来训练,所以维度为None

2.2 PNet,RNet,ONet

引入在from src.mtcnn import PNet, RNet, ONet

2.2.1 pnet = PNet({'data': image_pnet}, mode='test')

 定义在src/mtcnn.py之中

#src/mtcnn.py
class PNet(NetWork):
    def setup(self, task='data', reuse=False):
    ...
    if self.mode == 'train':
        ...
    else
        ...
        self.out_put.append(self.get_output())

pnet是具体化的PNet,其中的'data'对应于image_pnet,其中的mode对应于'train'的else

2.2.2 out_tensor_pnet = pnet.get_all_output()

#src/mtcnn.py
class NetWork(object):
    ...
    def get_all_output(self):
        return self.out_put
    ...

    def get_output(self):
        return self.terminals[-1]

self.output是最终定义完网络结构之后的最终的输出。

针对Pnet,输入为image_pnet,输出为out_tensor_pnet

三、tf.train.saver模型的保存与恢复

https://www.cnblogs.com/denny402/p/6940134.html

https://blog.csdn.net/index20001/article/details/74322198

四、网络的结构定义

Pnet原始结构

Feature size

Kernel size

Stride

Padding

12*12*3

3*3*10

1

Valid

10*10*10

Maxpool 2*2

2

Same

5*5*10

3*3*16

1

Valid

3*3*16

3*3*32

1

Valid

1*1*32

 

 

 

Pnet改进结构

Feature size

Kernel size

Stride

Padding

12*12*3

3*3*10

1

Valid

10*10*10

3*3*10

2

Same

5*5*10

3*3*16

1

Valid

3*3*16

3*3*32

1

Valid

1*1*32

 

 

 

Pnet  最终结构,只有3×3的卷积,padding为same

Feature size

Kernel size

Stride

Padding

12*12*3

3*3*10

1

Same

12*12*10

3*3*16

2

Same

6*6*16

3*3*32

2

Same

3*3*32

3*3*32

1

Valid

1*1*32

 

 

 

注意!代码更改之后stride也变了,所以需要更改tools之中的generateBoundingBox的stride的尺度,应该为4

最终训练结构:

all trainable variables:
<tf.Variable 'pnet/conv1/weights:0' shape=(3, 3, 3, 10) dtype=float32_ref>
<tf.Variable 'pnet/conv1/biases:0' shape=(10,) dtype=float32_ref>
<tf.Variable 'pnet/PReLU1/alpha:0' shape=(10,) dtype=float32_ref>
<tf.Variable 'pnet/pool1_conv1/weights:0' shape=(3, 3, 10, 16) dtype=float32_ref>
<tf.Variable 'pnet/pool1_conv1/biases:0' shape=(16,) dtype=float32_ref>
<tf.Variable 'pnet/pool1_PReLU1/alpha:0' shape=(16,) dtype=float32_ref>
<tf.Variable 'pnet/conv2/weights:0' shape=(3, 3, 16, 32) dtype=float32_ref>
<tf.Variable 'pnet/conv2/biases:0' shape=(32,) dtype=float32_ref>
<tf.Variable 'pnet/PReLU2/alpha:0' shape=(32,) dtype=float32_ref>
<tf.Variable 'pnet/conv3/weights:0' shape=(3, 3, 32, 32) dtype=float32_ref>
<tf.Variable 'pnet/conv3/biases:0' shape=(32,) dtype=float32_ref>
<tf.Variable 'pnet/PReLU3/alpha:0' shape=(32,) dtype=float32_ref>
<tf.Variable 'pnet/conv4-1/weights:0' shape=(1, 1, 32, 2) dtype=float32_ref>
<tf.Variable 'pnet/conv4-1/biases:0' shape=(2,) dtype=float32_ref>
<tf.Variable 'pnet/conv4-2/weights:0' shape=(1, 1, 32, 4) dtype=float32_ref>
<tf.Variable 'pnet/conv4-2/biases:0' shape=(4,) dtype=float32_ref>
<tf.Variable 'pnet/conv4-3/weights:0' shape=(1, 1, 32, 10) dtype=float32_ref>
<tf.Variable 'pnet/conv4-3/biases:0' shape=(10,) dtype=float32_ref>

Rnet原始结构

Feature size

Kernel size

Stride

Padding

24*24*3

3*3*28

1

Valid

22*22*28

maxPool  3*3

2

Same

11*11*28

3*3*48

1

Valid

9*9*48

maxPool  3*3

2

valid

4*4*48

2*2*64

1

Valid

3*3*64

 

 

 

Rnet改进结构

Feature size

Kernel size

Stride

Padding

24*24*3

3*3*28

1

Valid

22*22*28

3*3*28

2

Same

11*11*28

3*3*48

1

Valid

9*9*48

3*3*48

2

same

5*5*48

3*3*64

1

valid

3*3*64

 

 

 

Rnet最终结构

Feature size

Kernel size

Stride

Padding

24*24*3

3*3*28

1

Same

24*24*28

3*3*28 2 Same
12*12*28 3*3*48 1 Same
12*12*48 3*3*48 2 Same
6*6*48 3*3*64 2 Same
3*3*64

 

 

 

最终训练结构:

all trainable variables:
<tf.Variable 'rnet/conv1/weights:0' shape=(3, 3, 3, 28) dtype=float32_ref>
<tf.Variable 'rnet/conv1/biases:0' shape=(28,) dtype=float32_ref>
<tf.Variable 'rnet/prelu1/alpha:0' shape=(28,) dtype=float32_ref>
<tf.Variable 'rnet/pool1_conv1/weights:0' shape=(3, 3, 28, 28) dtype=float32_ref>
<tf.Variable 'rnet/pool1_conv1/biases:0' shape=(28,) dtype=float32_ref>
<tf.Variable 'rnet/pool1_prelu1/alpha:0' shape=(28,) dtype=float32_ref>
<tf.Variable 'rnet/conv2/weights:0' shape=(3, 3, 28, 48) dtype=float32_ref>
<tf.Variable 'rnet/conv2/biases:0' shape=(48,) dtype=float32_ref>
<tf.Variable 'rnet/prelu2/alpha:0' shape=(48,) dtype=float32_ref>
<tf.Variable 'rnet/pool2_conv3/weights:0' shape=(3, 3, 48, 48) dtype=float32_ref>
<tf.Variable 'rnet/pool2_conv3/biases:0' shape=(48,) dtype=float32_ref>
<tf.Variable 'rnet/poo2_prelu3/alpha:0' shape=(48,) dtype=float32_ref>
<tf.Variable 'rnet/conv3/weights:0' shape=(3, 3, 48, 64) dtype=float32_ref>
<tf.Variable 'rnet/conv3/biases:0' shape=(64,) dtype=float32_ref>
<tf.Variable 'rnet/prelu3/alpha:0' shape=(64,) dtype=float32_ref>
<tf.Variable 'rnet/conv4/weights:0' shape=(576, 128) dtype=float32_ref>
<tf.Variable 'rnet/conv4/biases:0' shape=(128,) dtype=float32_ref>
<tf.Variable 'rnet/prelu4/alpha:0' shape=(128,) dtype=float32_ref>
<tf.Variable 'rnet/conv5-1/weights:0' shape=(128, 2) dtype=float32_ref>
<tf.Variable 'rnet/conv5-1/biases:0' shape=(2,) dtype=float32_ref>
<tf.Variable 'rnet/conv5-2/weights:0' shape=(128, 4) dtype=float32_ref>
<tf.Variable 'rnet/conv5-2/biases:0' shape=(4,) dtype=float32_ref>
<tf.Variable 'rnet/conv5-3/weights:0' shape=(128, 10) dtype=float32_ref>
<tf.Variable 'rnet/conv5-3/biases:0' shape=(10,) dtype=float32_ref>

Onet原始结构

Feature size

Kernel size

Stride

Padding

48*48*3

3*3*32

1

Valid

46*46*32

maxPool  3*3

2

Same

23*23*32

3*3*64

1

Valid

21*21*64

maxPool  3*3

2

valid

10*10*64

3*3*64

1

Valid

8*8*64

maxPool  2*2

2

Same

4*4*64 2*2*128 1 valid
3*3*128      

Onet改进结构

Feature size

Kernel size

Stride

Padding

48*48*3

3*3*32

1

Valid

46*46*32

3*3*32

2

Same

23*23*32

3*3*64

1

valid

21*21*64

3*3*64

2

valid

10*10*64

3*3*128

2

same

5*5*128

3*3*128

1

valid

3*3*128      

Onet最终结构

Feature size

Kernel size

Stride

Padding

48*48*3

3*3*32

1

Same

48*48*32

3*3*64

2

Same

24*24*64

3*3*64

1

Same

24*24*64

3*3*128

2

Same

12*12*128

3*3*256

2

Same

6*6*256

3*3*128

2

Same
3*3*128      

最终训练参数

<tf.Variable 'onet/conv1/weights:0' shape=(3, 3, 3, 32) dtype=float32_ref>
<tf.Variable 'onet/conv1/biases:0' shape=(32,) dtype=float32_ref>
<tf.Variable 'onet/prelu1/alpha:0' shape=(32,) dtype=float32_ref>
<tf.Variable 'onet/conv2/weights:0' shape=(3, 3, 32, 64) dtype=float32_ref>
<tf.Variable 'onet/conv2/biases:0' shape=(64,) dtype=float32_ref>
<tf.Variable 'onet/prelu2/alpha:0' shape=(64,) dtype=float32_ref>
<tf.Variable 'onet/conv3/weights:0' shape=(3, 3, 64, 64) dtype=float32_ref>
<tf.Variable 'onet/conv3/biases:0' shape=(64,) dtype=float32_ref>
<tf.Variable 'onet/prelu3/alpha:0' shape=(64,) dtype=float32_ref>
<tf.Variable 'onet/conv4/weights:0' shape=(3, 3, 64, 128) dtype=float32_ref>
<tf.Variable 'onet/conv4/biases:0' shape=(128,) dtype=float32_ref>
<tf.Variable 'onet/prelu4/alpha:0' shape=(128,) dtype=float32_ref>
<tf.Variable 'onet/conv5_/weights:0' shape=(3, 3, 128, 256) dtype=float32_ref>
<tf.Variable 'onet/conv5_/biases:0' shape=(256,) dtype=float32_ref>
<tf.Variable 'onet/prelu5_/alpha:0' shape=(256,) dtype=float32_ref>
<tf.Variable 'onet/conv6_/weights:0' shape=(3, 3, 256, 128) dtype=float32_ref>
<tf.Variable 'onet/conv6_/biases:0' shape=(128,) dtype=float32_ref>
<tf.Variable 'onet/prelu6_/alpha:0' shape=(128,) dtype=float32_ref>
<tf.Variable 'onet/conv5/weights:0' shape=(1152, 256) dtype=float32_ref>
<tf.Variable 'onet/conv5/biases:0' shape=(256,) dtype=float32_ref>
<tf.Variable 'onet/prelu5/alpha:0' shape=(256,) dtype=float32_ref>
<tf.Variable 'onet/conv6-1/weights:0' shape=(256, 2) dtype=float32_ref>
<tf.Variable 'onet/conv6-1/biases:0' shape=(2,) dtype=float32_ref>
<tf.Variable 'onet/conv6-2/weights:0' shape=(256, 4) dtype=float32_ref>
<tf.Variable 'onet/conv6-2/biases:0' shape=(4,) dtype=float32_ref>
<tf.Variable 'onet/conv6-3/weights:0' shape=(256, 10) dtype=float32_ref>
<tf.Variable 'onet/conv6-3/biases:0' shape=(10,) dtype=float32_ref>

4.1 conv之中的定义

#src/mtcnn.py   in class NetWork(object):  
def conv(self, inp, k_h, k_w, c_o, s_h, s_w, name,
		 task=None, relu=True, padding='SAME',
		 group=1, biased=True, wd=None):

	self.validate_padding(padding)
	c_i = int(inp.get_shape()[-1])
	assert c_i % group == 0
	assert c_o % group == 0

	def convolve(i, k): return tf.nn.conv2d(
		i, k, [1, s_h, s_w, 1], padding=padding)
	with tf.variable_scope(name) as scope:
		kernel = self.make_var(
			'weights', shape=[
				k_h, k_w, c_i / group, c_o])
		if group == 1:
			output = convolve(inp, kernel)
		else:
			input_groups = tf.split(inp, group, 3)
			kernel_groups = tf.split(kernel, group, 3)
			output_groups = [convolve(i, k) for i, k in
							 zip(input_groups, kernel_groups)]
			output = tf.concat(output_groups, 3)
		if (wd is not None) and (self.mode == 'train'):
			self.weight_decay[task].append(
				tf.multiply(tf.nn.l2_loss(kernel), wd))
		if biased:
			biases = self.make_var('biases', [c_o])
			output = tf.nn.bias_add(output, biases)
		if relu:
			output = tf.nn.relu(output, name=scope.name)
		return output

padding='SAME'就是输入输出一样大,‘VALID’就是不进行padding

几个数字分别为卷积核的大小,卷积核的个数,卷积核时的stride

4.2 wd=self.weight_decay_coeff

网络中有的有此语句,有的没有此语句,该语句全都在后几层。

# src/mtcnn.py  in class class NetWork(object): 
# in    def conv
if (wd is not None) and (self.mode == 'train'):
	self.weight_decay[task].append(
		tf.multiply(tf.nn.l2_loss(kernel), wd))

是对权重的步长的设置,应该对结果没有太大影响。

猜你喜欢

转载自blog.csdn.net/weixin_36474809/article/details/82856171
今日推荐